Import

The import command converts agent session transcripts and selected external datasets into AgentV formats. Transcript imports let you grade past runs offline without re-running the agent. Dataset imports help seed AgentV YAML from portable case sources.

AgentV no longer maintains agentv import promptfoo as a first-class core import path. Migrate Promptfoo configs by rewriting the relevant prompts, tests, and assertions as native AgentV eval YAML, or keep any one-off conversion logic outside the AgentV CLI.

Supported Sources

Source	Command	Input
Claude Code	`agentv import claude`	`~/.claude/projects/<path>/<uuid>.jsonl`
Codex CLI	`agentv import codex`	`~/.codex/sessions/<YYYY>/<MM>/<DD>/rollout-*.jsonl`
Copilot CLI	`agentv import copilot`	`~/.copilot/session-state/<uuid>/events.jsonl`
HuggingFace datasets	`agentv import huggingface`	Dataset repository and split

`import claude`

Import a Claude Code session transcript.

List available sessions

agentv import claude --list

Output:

Found 5 session(s):

  4c4f9e4e-e6f1-490b-a1b1-9aef543ebf22  2m ago  -home-user-myproject
  087b801a-7a63-48ff-b348-62563a290b23  1h ago  -home-user-myproject
  ed8b8c62-4414-49fb-8739-006d809c8588  3h ago  -home-user-other-project

Import a specific session

agentv import claude --session-id 4c4f9e4e-e6f1-490b-a1b1-9aef543ebf22

Filter by project path

agentv import claude --list --project-path /home/user/myproject

Custom output path

agentv import claude --session-id <uuid> -o transcripts/my-session.jsonl

Default output: .agentv/transcripts/claude-<session-id-short>.jsonl

`import codex`

Import a Codex CLI session transcript.

List available sessions

agentv import codex --list

Import a specific session

agentv import codex --session-id 019d5cff-9f02-7bc3-8f98-2071ba17ef0e

`import copilot`

Import a Copilot CLI session transcript.

List available sessions

agentv import copilot --list

Import a specific session

agentv import copilot --session-id 9ca6d90c-1d80-40d1-b805-c59ee31fc007

`import huggingface`

Import a HuggingFace dataset into AgentV eval YAML files.

agentv import huggingface --repo SWE-bench/SWE-bench_Verified --split test --limit 10 --output evals/swebench/

Options

The transcript providers share the same core flags:

Flag	Description
`--session-id <uuid>`	Import a specific session by UUID
`--list`	List available sessions instead of importing
`--output, -o <path>`	Custom output file path

Provider-specific flags:

Flag	Provider	Description
`--project-path <path>`	Claude	Filter sessions by project path
`--projects-dir <dir>`	Claude	Override `~/.claude/projects` directory
`--date <YYYY-MM-DD>`	Codex	Filter sessions by date
`--sessions-dir <dir>`	Codex	Override `~/.codex/sessions` directory
`--session-state-dir <dir>`	Copilot	Override `~/.copilot/session-state` directory

HuggingFace dataset import uses dataset-specific flags:

Flag	Description
`--repo <name>`	HuggingFace dataset repository
`--split <name>`	Dataset split to load
`--limit <number>`	Maximum number of instances to import
`--output, -o <dir>`	Output directory for generated eval YAML files

Output Format

Imported transcripts are written as AgentV transcript JSONL. Each row is a provider-neutral agentv.transcript.v1 message row grouped by test_id and ordered by message_index:

{"schema_version":"agentv.transcript.v1","test_id":"claude-session-1","target":"claude","message_index":0,"role":"user","content":"Fix the bug in auth.ts","capture":{"content":"full","redaction_level":"none"},"source":{"kind":"imported_transcript","provider":"claude","session_id":"claude-session-1"}}
{"schema_version":"agentv.transcript.v1","test_id":"claude-session-1","target":"claude","message_index":1,"role":"assistant","content":"I'll fix the authentication bug.","tool_calls":[{"tool":"Read","id":"toolu_01...","input":{"file_path":"src/auth.ts"},"output":"...file contents..."}],"capture":{"content":"full","redaction_level":"none"},"source":{"kind":"imported_transcript","provider":"claude","session_id":"claude-session-1"}}

Stable top-level fields are schema_version, test_id, target, message_index, role, optional name, content, tool_calls, start_time, end_time, duration_ms, metadata, token_usage, transcript-level transcript_token_usage, transcript_duration_ms, transcript_cost_usd, capture, optional trace, and source. Provider-native details stay inside opaque nested fields such as metadata, source.metadata, tool input, or tool output; they are not custom top-level row keys.

Rows without schema_version, capture, or trace from older AgentV transcript exports remain replayable. New eval run artifacts write the v1 shape. For eval run artifacts, transcript.jsonl is the portable message/event projection. AgentV does not persist a public trace.json run sidecar, and the transcript is not a provider-native session dump. Provider-native session or stream logs, when captured during a new eval run, are preserved in transcript-raw.jsonl and referenced by transcript_raw_path; raw_provider_log_path is a legacy/imported pointer when older bundles or external sources already provide one. Agent Skills import, convert, transpile, and run paths do not require those legacy log pointers.

What Gets Parsed

Claude Event	AgentV Message
`user`	`{ role: 'user', content }`
`assistant`	`{ role: 'assistant', content, toolCalls }`
`tool_use` blocks	`ToolCall { tool, input, id }`
`tool_result` blocks	Paired with matching `tool_use` by ID
`progress`, `system`	Skipped
Subagent events	Filtered out (v1)

Token usage is aggregated from the final cumulative value per LLM request. Duration is computed from first-to-last event timestamp.

Workflow

Import a session, then run graders against it:

# 1. List sessions and pick one
agentv import claude --list

# 2. Import a session by ID
agentv import claude --session-id 4c4f9e4e-e6f1-490b-a1b1-9aef543ebf22

# 3. Run graders against the imported transcript
agentv eval evals/my-eval.yaml --transcript .agentv/transcripts/claude-4c4f9e4e.jsonl

See examples/features/import-claude/ for a complete working example.

HuggingFace Datasets (SWE-bench)

Use scripts/import-huggingface.py to convert HuggingFace benchmark datasets into AgentV eval files. Currently supports SWE-bench-style datasets.

uv run scripts/import-huggingface.py \
  --repo SWE-bench/SWE-bench_Verified \
  --split test \
  --limit 10 \
  --output evals/swebench/

Each instance becomes an EVAL.yaml with:

input — the problem statement
workspace.docker.image — the pre-built SWE-bench Docker image (ghcr.io/epoch-research/swe-bench.eval.x86_64.<instance_id>:latest)
workspace.repos[].base_commit — the commit to reset to before the agent runs
assertions — code-grader tasks that run FAIL_TO_PASS and PASS_TO_PASS pytest suites inside the container

Run an imported SWE-bench eval against any coding agent target:

# Import one instance
uv run scripts/import-huggingface.py \
  --repo SWE-bench/SWE-bench_Verified \
  --limit 1 \
  --output /tmp/swebench-eval/

# Run with a coding agent target
agentv eval /tmp/swebench-eval/*.EVAL.yaml --target codex

The Docker workspace spins up the pre-built SWE-bench image, checks out base_commit, runs the agent to apply a patch, then grades by running the test suite inside the container.