Skip to content

Import

The import command converts agent session transcripts and selected external datasets into AgentV formats. Transcript imports let you grade past runs offline without re-running the agent. Dataset imports help seed AgentV YAML from portable case sources.

AgentV no longer maintains agentv import promptfoo as a first-class core import path. Migrate Promptfoo configs by rewriting the relevant prompts, tests, and assertions as native AgentV eval YAML, or keep any one-off conversion logic outside the AgentV CLI.

SourceCommandInput
Claude Codeagentv import claude~/.claude/projects/<path>/<uuid>.jsonl
Codex CLIagentv import codex~/.codex/sessions/<YYYY>/<MM>/<DD>/rollout-*.jsonl
Copilot CLIagentv import copilot~/.copilot/session-state/<uuid>/events.jsonl
HuggingFace datasetsagentv import huggingfaceDataset repository and split

Import a Claude Code session transcript.

Terminal window
agentv import claude --list

Output:

Found 5 session(s):
4c4f9e4e-e6f1-490b-a1b1-9aef543ebf22 2m ago -home-user-myproject
087b801a-7a63-48ff-b348-62563a290b23 1h ago -home-user-myproject
ed8b8c62-4414-49fb-8739-006d809c8588 3h ago -home-user-other-project
Terminal window
agentv import claude --session-id 4c4f9e4e-e6f1-490b-a1b1-9aef543ebf22
Terminal window
agentv import claude --list --project-path /home/user/myproject
Terminal window
agentv import claude --session-id <uuid> -o transcripts/my-session.jsonl

Default output: .agentv/transcripts/claude-<session-id-short>.jsonl

Import a Codex CLI session transcript.

Terminal window
agentv import codex --list
Terminal window
agentv import codex --session-id 019d5cff-9f02-7bc3-8f98-2071ba17ef0e

Import a Copilot CLI session transcript.

Terminal window
agentv import copilot --list
Terminal window
agentv import copilot --session-id 9ca6d90c-1d80-40d1-b805-c59ee31fc007

Import a HuggingFace dataset into AgentV eval YAML files.

Terminal window
agentv import huggingface --repo SWE-bench/SWE-bench_Verified --split test --limit 10 --output evals/swebench/

The transcript providers share the same core flags:

FlagDescription
--session-id <uuid>Import a specific session by UUID
--listList available sessions instead of importing
--output, -o <path>Custom output file path

Provider-specific flags:

FlagProviderDescription
--project-path <path>ClaudeFilter sessions by project path
--projects-dir <dir>ClaudeOverride ~/.claude/projects directory
--date <YYYY-MM-DD>CodexFilter sessions by date
--sessions-dir <dir>CodexOverride ~/.codex/sessions directory
--session-state-dir <dir>CopilotOverride ~/.copilot/session-state directory

HuggingFace dataset import uses dataset-specific flags:

FlagDescription
--repo <name>HuggingFace dataset repository
--split <name>Dataset split to load
--limit <number>Maximum number of instances to import
--output, -o <dir>Output directory for generated eval YAML files

Imported transcripts are written as AgentV transcript JSONL. Each row is a provider-neutral agentv.transcript.v1 message row grouped by test_id and ordered by message_index:

{"schema_version":"agentv.transcript.v1","test_id":"claude-session-1","target":"claude","message_index":0,"role":"user","content":"Fix the bug in auth.ts","capture":{"content":"full","redaction_level":"none"},"source":{"kind":"imported_transcript","provider":"claude","session_id":"claude-session-1"}}
{"schema_version":"agentv.transcript.v1","test_id":"claude-session-1","target":"claude","message_index":1,"role":"assistant","content":"I'll fix the authentication bug.","tool_calls":[{"tool":"Read","id":"toolu_01...","input":{"file_path":"src/auth.ts"},"output":"...file contents..."}],"capture":{"content":"full","redaction_level":"none"},"source":{"kind":"imported_transcript","provider":"claude","session_id":"claude-session-1"}}

Stable top-level fields are schema_version, test_id, target, message_index, role, optional name, content, tool_calls, start_time, end_time, duration_ms, metadata, token_usage, transcript-level transcript_token_usage, transcript_duration_ms, transcript_cost_usd, capture, optional trace, and source. Provider-native details stay inside opaque nested fields such as metadata, source.metadata, tool input, or tool output; they are not custom top-level row keys.

Rows without schema_version, capture, or trace from older AgentV transcript exports remain replayable. New eval run artifacts write the v1 shape. For eval run artifacts, transcript.jsonl is the portable message/event projection. AgentV does not persist a public trace.json run sidecar, and the transcript is not a provider-native session dump. Provider-native session or stream logs, when captured during a new eval run, are preserved in transcript-raw.jsonl and referenced by transcript_raw_path; raw_provider_log_path is a legacy/imported pointer when older bundles or external sources already provide one. Agent Skills import, convert, transpile, and run paths do not require those legacy log pointers.

Claude EventAgentV Message
user{ role: 'user', content }
assistant{ role: 'assistant', content, toolCalls }
tool_use blocksToolCall { tool, input, id }
tool_result blocksPaired with matching tool_use by ID
progress, systemSkipped
Subagent eventsFiltered out (v1)

Token usage is aggregated from the final cumulative value per LLM request. Duration is computed from first-to-last event timestamp.

Import a session, then run graders against it:

Terminal window
# 1. List sessions and pick one
agentv import claude --list
# 2. Import a session by ID
agentv import claude --session-id 4c4f9e4e-e6f1-490b-a1b1-9aef543ebf22
# 3. Run graders against the imported transcript
agentv eval evals/my-eval.yaml --transcript .agentv/transcripts/claude-4c4f9e4e.jsonl

See examples/features/import-claude/ for a complete working example.

Use scripts/import-huggingface.py to convert HuggingFace benchmark datasets into AgentV eval files. Currently supports SWE-bench-style datasets.

Terminal window
uv run scripts/import-huggingface.py \
--repo SWE-bench/SWE-bench_Verified \
--split test \
--limit 10 \
--output evals/swebench/

Each instance becomes an EVAL.yaml with:

  • input — the problem statement
  • workspace.docker.image — the pre-built SWE-bench Docker image (ghcr.io/epoch-research/swe-bench.eval.x86_64.<instance_id>:latest)
  • workspace.repos[].base_commit — the commit to reset to before the agent runs
  • assertionscode-grader tasks that run FAIL_TO_PASS and PASS_TO_PASS pytest suites inside the container

Run an imported SWE-bench eval against any coding agent target:

Terminal window
# Import one instance
uv run scripts/import-huggingface.py \
--repo SWE-bench/SWE-bench_Verified \
--limit 1 \
--output /tmp/swebench-eval/
# Run with a coding agent target
agentv eval /tmp/swebench-eval/*.EVAL.yaml --target codex

The Docker workspace spins up the pre-built SWE-bench image, checks out base_commit, runs the agent to apply a patch, then grades by running the test suite inside the container.