Configuration Reference
Complete reference for axis.config.{json|js|mjs|ts}.
Full Example
AXIS is configured via an axis.config.* file in your project root. JSON is the
default; JavaScript and TypeScript configs are also supported and
let you compose your config programmatically. Here is a JSON example showing all available
fields:
{
"scenarios": "./scenarios",
"agents": [
"claude-code",
{
"agent": "gemini",
"model": "gemini-2.5-pro",
"scenarios": ["cms/*"],
"flags": { "yolo": true }
}
],
"settings": {
"concurrency": 4,
"scoring_weights": {
"goal_achievement": 0.4,
"environment": 0.2,
"service": 0.2,
"agent": 0.2
},
"limits": {
"run": { "time_minutes": 60, "tokens": 2000000 },
"scenario": { "time_minutes": 10, "tokens": 200000 }
}
},
"env": ["ANTHROPIC_API_KEY", "GEMINI_API_KEY"],
"mcp_servers": {
"filesystem": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
}
},
"judging": {
"agents": ["claude-code", "codex"]
},
"skills": ["./skills/deploy"],
"adapters": {
"my-agent": "./adapters/my-agent.ts"
},
"beforeAll": [
{ "action": "run_script", "command": "docker compose up -d test-db" }
],
"afterAll": [
{ "action": "run_script", "command": "echo \"done: $AXIS_COMPLETED/$AXIS_TOTAL (report: $AXIS_REPORT_DIR)\"" }
]
} Config File Formats
AXIS resolves the config file by extension, in priority order:
axis.config.ts → axis.config.js →
axis.config.mjs → axis.config.json. Use
--config <path> to point at a specific file.
| Extension | Loader | Notes |
|---|---|---|
.json | Native JSON parse | Static config, no executable logic. |
.js / .mjs / .cjs | Native dynamic import | ESM module. Default export is the config object or a function returning one. |
.ts / .mts / .cts | Loaded via jiti | No build step needed. Type-only imports are stripped at runtime. |
JavaScript / TypeScript configs
JS and TS configs let you build the config programmatically. Useful for sharing logic across scenarios, deriving values from environment variables, or generating large numbers of scenarios from a fixture set. The module's default export must be either the config object directly or a (sync or async) function that returns one:
// axis.config.ts
import type { AxisConfig, InlineScenario } from "@netlify/axis";
import applyLimits from "./scenarios/apply-limits.js";
import authorScenario from "./scenarios/author-scenario.js";
const dynamicScenarios: InlineScenario[] = ["alpha", "beta", "gamma"].map((id) => ({
key: "smoke-" + id,
name: "Smoke test " + id,
prompt: "Do thing for " + id,
rubric: [{ check: "Did the thing" }],
}));
export default {
scenarios: [
"./scenarios",
applyLimits,
authorScenario,
...dynamicScenarios,
],
agents: ["claude-code"],
settings: {
limits: { run: { time_minutes: 60 } },
},
} satisfies AxisConfig; Or as a function (sync or async):
// axis.config.ts
import type { AxisConfig } from "@netlify/axis";
export default async () => {
const fixtures = await loadFixtures();
const config: AxisConfig = {
scenarios: fixtures.map(buildScenario),
agents: ["claude-code"],
};
return config;
}; axis init --format ts (or --format js) scaffolds a typed config
file alongside a sample JSON scenario. Without --format, AXIS produces a
.json config to preserve back-compat.
Top-Level Fields
| Field | Type | Required | Description |
|---|---|---|---|
scenarios | string | (string | InlineScenario)[] | No |
A path to the scenarios directory, or an array of paths and/or inline scenario objects.
Inline entries must include a key; entries loaded from files take their
key from the file path. Defaults to "./scenarios" when
omitted. Array entries may also be git repo URLs; see
Remote scenarios. See
Authoring scenarios for the full schema.
|
agents | (string | AgentConfig)[] | Yes | Agent names or full agent configurations. |
settings | object | No | Concurrency, scoring weight, and limit overrides. |
env | string[] | No | Additional environment variables to pass through to agent processes. |
mcp_servers | object | No | MCP servers available to all agents. |
judging | object | No | Precedence-ordered list of judge agents for scoring. See Judging Agents. When omitted, each run is judged by its own agent. |
skills | string[] | No | Skills available to all agents. |
adapters | object | No | Custom agent module paths, keyed by agent name. |
artifacts | string[] | No | Glob patterns of files to capture from each scenario's workspace after teardown. Merged with per-scenario artifacts. |
beforeAll | LifecycleAction[] | No | Lifecycle actions that run once before any scenarios start. See Run-Level Lifecycle. |
afterAll | LifecycleAction[] | No | Lifecycle actions that run once after every scenario has been scored and the report is finalized. See Run-Level Lifecycle. |
Run-Level Lifecycle
beforeAll and afterAll are run-level counterparts to a scenario's
setup and teardown: they
fire once per run rather than once per scenario. Use them to spin up shared infrastructure
before any agents start, or to upload the final report and send a completion notification
after everything is scored.
Both fields accept the same lifecycle action
types as scenario hooks (run_script and copy), and scripts run
with the config directory as their working directory.
{
"beforeAll": [
{ "action": "run_script", "command": "docker compose up -d test-postgres" }
],
"afterAll": [
{ "action": "run_script", "command": "./scripts/notify-slack.sh" },
{ "action": "run_script", "command": "docker compose down" }
]
}
A typical afterAll script can use the AXIS_* environment variables
below to assemble a summary message:
#!/usr/bin/env bash
# scripts/notify-slack.sh
curl -X POST "$SLACK_WEBHOOK" \
-H 'Content-Type: application/json' \
-d "{\"text\": \"AXIS run: $AXIS_COMPLETED/$AXIS_TOTAL passed in $AXIS_DURATION_MS ms (report: $AXIS_REPORT_DIR)\"}" -
Hooks fire from the
axisCLI only; the programmaticrun()API does not invoke them. Library users own their own orchestration. -
beforeAllruns before the report directory is created. A failure (non-zero exit) aborts the entire run with no report on disk. -
afterAllruns after every scenario has been scored and the report is finalized, so$AXIS_REPORT_DIR/report.jsonis readable. A failure causes a non-zero CLI exit but does not erase the report. - Both hooks honour the per-action 3-minute timeout. Each action runs sequentially; the first non-zero exit aborts the phase.
Run-level lifecycle environment variables
Both phases get the shared $AXIS_OUTPUT markdown sink and a AXIS_PHASE
discriminator (beforeAll or afterAll). afterAll additionally
receives summary stats and the path to the finalized report:
| Variable | Phase | Value |
|---|---|---|
AXIS_PHASE | Both | Either beforeAll or afterAll. |
AXIS_OUTPUT | Both | Path to a per-phase markdown file. Anything written here surfaces in the CLI log. |
AXIS_REPORT_DIR | afterAll | Absolute path to the just-written .axis/reports/{reportId}/ directory. report.json, report.html, and the per-scenario JSON files are all on disk by the time this script runs. |
AXIS_TOTAL | afterAll | Number of jobs executed (agent × scenario combinations). |
AXIS_COMPLETED | afterAll | Number of jobs that finished successfully. |
AXIS_FAILED | afterAll | Number of jobs that failed. |
AXIS_DURATION_MS | afterAll | Total run duration in milliseconds. |
Agent Configuration
Each entry in the agents array can be a simple string (agent name with defaults)
or a full configuration object.
| Field | Type | Required | Description |
|---|---|---|---|
agent | string | Yes | Agent name: claude-code, codex, gemini, goose, etc. |
model | string | No | Model override passed to the agent CLI. |
scenarios | string[] | No | Subset of scenarios to run. Supports glob patterns like cms/*. |
skills | string[] | No | Agent-specific skills (merged with top-level skills). |
flags | object | No | CLI flags passed to the agent, e.g. {"full-auto": true}. |
command | string | No | Custom CLI command (for custom agents). |
Scoring Weights
Override the default dimension weights under settings.scoring_weights. Values
must sum to 1.0. See Scoring Framework for what each dimension measures.
| Field | Type | Required | Description |
|---|---|---|---|
goal_achievement | number | No | Goal Achievement weight. Default: 0.4. |
environment | number | No | Environment weight. Default: 0.2. |
service | number | No | Service weight. Default: 0.2. |
agent | number | No | Agent weight. Default: 0.2. |
{
"settings": {
"scoring_weights": {
"goal_achievement": 0.5,
"environment": 0.2,
"service": 0.2,
"agent": 0.1
}
}
} Limits
Limits control how much time and tokens a run or individual scenario can consume. This prevents runaway agents from consuming unbounded resources. Limits can be configured at three levels:
- Overall run limits (
settings.limits.run): when hit, all remaining and in-progress jobs are aborted. - Default scenario limits (
settings.limits.scenario): default budget for each individual job. Only that job fails when exceeded. - Per-scenario limits (
limitsin the scenario JSON): override the defaults for a specific scenario. Only that job fails when exceeded.
Default behavior
Even without any limits configured, each scenario has a default time limit of 15 minutes.
You can override this by setting settings.limits.scenario.time_minutes or by adding
limits.time_minutes to individual scenarios.
Limit fields
| Field | Type | Description |
|---|---|---|
time_minutes | number | Maximum wall-clock time in minutes. Accepts fractional values (e.g. 0.5 for 30 seconds). Default: 15 per scenario. |
tokens | number | Maximum total tokens (input + output + cache). Must be a positive integer. No default. |
Overall run limits
Set settings.limits.run to cap the total time or tokens across the entire run.
When an overall limit is reached, all remaining and currently running jobs are immediately
terminated and marked as failed.
{
"settings": {
"limits": {
"run": { "time_minutes": 60, "tokens": 2000000 }
}
}
} Per-scenario limits
Set settings.limits.scenario to define default per-job budgets. These can be
overridden by adding a limits field directly in a scenario file.
// axis.config.json: default for all scenarios
{
"settings": {
"limits": {
"scenario": { "time_minutes": 10, "tokens": 200000 }
}
}
}
// scenarios/expensive-task.json: override for one scenario
{
"name": "Expensive task",
"prompt": "...",
"rubric": "...",
"limits": { "time_minutes": 30, "tokens": 500000 }
} Token limits are enforced using a conservative estimate during execution (based on streamed assistant text). The actual token count may slightly exceed the limit before the job is terminated. The authoritative token count from the agent's API is used for overall run limit tracking.
MCP Servers
Configure Model Context Protocol servers that are automatically wired into each agent environment. AXIS supports both stdio (local process) and HTTP (remote endpoint) servers.
| Field | Type | Required | Description |
|---|---|---|---|
type | "stdio" | "http" | Yes | Server transport type. |
command | string | Yes | Command to start the server process (stdio only). |
args | string[] | No | Arguments passed to the command. |
env | object | No | Environment variables for the server process. |
url | string | Yes | Remote server endpoint URL (http only). |
headers | object | No | HTTP headers (supports ${VAR} env interpolation). |
{
"mcp_servers": {
"filesystem": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
"env": { "LOG_LEVEL": "info" }
},
"remote-api": {
"type": "http",
"url": "https://mcp.example.com/tools",
"headers": { "Authorization": "Bearer ${TOKEN}" }
}
}
}
Each NDJSON-style agent writes MCP configuration in its native format before spawning. ACP-based
adapters (claude-sdk, codex-sdk, gemini, and every other
ACP agent) pass MCP servers through the ACP session/new call instead.
| Agent | Config File | Location |
|---|---|---|
claude-code | .mcp.json | Workspace root |
codex | config.toml | CODEX_HOME |
Judging Agents
By default, every run is judged by the same agent that produced it, so the agent under test
scores its own work. Set judging.agents to a precedence-ordered list of
judge candidates. For each run, AXIS picks the first entry whose adapter name differs from
the run's own agent so a fresh perspective evaluates the work. If every entry matches the
run's own agent, the first entry is used.
Each entry accepts the same shorthand as agents: a string (agent name) or a full
AgentConfig object with model and adapter flags. Every
candidate judge must be installed and have any required environment variables set; AXIS
validates this during pre-flight before any jobs run.
{
"agents": ["claude-code", "codex", "gemini"],
"judging": {
"agents": ["claude-code", "codex"]
}
}
With the config above, runs by claude-code are judged by codex
(first entry whose adapter differs), and runs by codex or gemini
are judged by claude-code. Pin a specific model on any candidate by passing the
full AgentConfig form:
{
"judging": {
"agents": [
{ "agent": "claude-code", "model": "opus" },
"codex"
]
}
} Skills
Skills extend agent capabilities with reusable instruction sets. Specify them at the top level (shared across all agents), per agent, or per scenario.
| Format | Example | Description |
|---|---|---|
| Local path | ./skills/deploy | Relative to the config file. |
| GitHub shorthand | netlify/axis-skill-deploy | owner/repo format, cloned automatically. |
| Full URL | https://github.com/owner/repo | GitHub repository URL, cloned automatically. |
{
"skills": [
"./skills/deploy",
"netlify/axis-skill-deploy",
"https://github.com/owner/repo"
]
}
Remote skills are cached in .axis/skills-cache/. Use --refresh-skills
to force re-clone.
Environment Variables
The env field lists additional environment variables to pass through to agent
processes. The following are always passed through by default:
| Category | Variables |
|---|---|
| API keys | ANTHROPIC_API_KEY, CODEX_API_KEY, GEMINI_API_KEY |
| System | PATH, USER, SHELL, LANG, TERM, TMPDIR |
{
"env": ["MY_CUSTOM_TOKEN", "DATABASE_URL"]
} Scenarios
Scenarios live in the configured scenarios directory as .json,
.js, or .ts files, or are listed inline in
axis.config.{js,ts}. The filename (without extension) becomes the scenario
key, and nested directories create namespaced keys: scenarios/cms/create-post.ts
maps to cms/create-post.
Scenarios can define variants to run the same task under
different configurations (skills, MCP servers, prompts, etc.) without duplicating files. Each
variant produces a separate job with a key like create-post@variant-name.
See Writing Scenarios for the complete scenario schema, the authoring formats, rubric design guidance, and examples.
Remote Scenarios
Remote scenarios let one AXIS project pull a scenario library straight out of another git
repository, instead of vendoring or copy-pasting scenario files. A team can publish a
canonical set of scenarios (with their setup scripts, fixtures, MCP servers, and skills)
once, and every downstream project consumes it by listing the repo URL in its
scenarios array. When the upstream library changes, the next AXIS run picks it
up automatically; there's no version to bump and no files to re-sync.
This is especially useful for:
- Org-wide scenario libraries. A platform team curates the scenarios that represent "what agents on our stack actually need to do," and every product team's config references that library to benchmark their own agent against the same yardstick.
- Sharing a benchmark across forks. A reference repo defines the agreed-on scenarios; multiple agent implementations point at it to publish comparable AXIS Results.
- Decoupling scenario authoring from agent testing. The team that knows the
product writes the scenarios in their repo; the team running the agent only owns its
axis.config.*and the agent it's testing.
Using a remote scenarios repo
Add a git repository URL to the scenarios array. Local paths and remote URLs
can be mixed freely.
{
"scenarios": [
"./scenarios",
"https://github.com/netlify/all-scenarios"
],
"agents": ["claude-code"]
}
On each run AXIS clones the repo into
.axis/remotes/<reversed-host>/<owner>/<repo>/, reads
that repo's axis.config.*, and inlines its scenarios entries
into the parent, resolved to absolute paths inside the clone. Inline scenario objects
from a remote repo are passed through unchanged. If the cloned repo has no
axis.config.* at its root, the whole repo is walked as a scenarios directory
(equivalent to listing the clone path directly).
From here on, the runner behaves as if every scenario had been local from the start: discovery, filtering, lifecycle, scoring, and reporting are all unchanged.
Supporting config that comes with the scenarios
Remote scenarios usually depend on more than just their own files. Their setup scripts need
certain env vars exported, they expect specific MCP servers to be configured, they rely on
shared skills, and so on. To avoid forcing every parent project to re-declare all of that,
AXIS folds a few supporting fields from the remote repo's axis.config.* into the
parent config. The parent always wins on collisions.
| Field | Merge semantics |
|---|---|
env | Set union of var names; parent first then remote. |
mcp_servers | Keyed merge; the parent's value wins when both declare the same server name. |
skills |
Ordered union with dedup, parent first. Local-path entries from the remote
(./...) are rewritten to absolute paths inside the clone directory; URL
and owner/repo shorthand entries pass through unchanged.
|
artifacts | Glob patterns concatenated and deduped, parent first. |
adapters | Keyed merge with parent precedence; remote module paths are rewritten to absolute paths inside the clone directory so the remote adapter loads correctly. |
Other top-level fields from the remote repo are ignored: agents,
settings, judging, beforeAll, afterAll, and
name. These belong to the parent project: it decides which agents to test, how
to score them, and what run-level lifecycle to fire.
Freshness and caching
AXIS always runs git pull --ff-only on each invocation when the clone already
exists, and does a shallow git clone the first time. There is no opt-in caching
flag. The trade-off favours always-fresh scenario libraries over offline runs.
Dependencies
Remote scenarios authored as .ts/.js modules often import workspace
helpers and external packages. If the cloned repo has a package.json and no
node_modules/, AXIS runs npm install (or pnpm install /
yarn install based on the lockfile) automatically before walking. Install failures
are logged but do not abort the run; modules whose imports fail are reported and skipped.
Nested remote references
By default, a remote repo's scenarios may not itself list further
remote URLs; AXIS errors out with the offending URL named. Increase
settings.remotes.maxDepth to allow nesting. Cycles
(A → B → A) are always rejected regardless of depth.
{
"scenarios": ["./scenarios", "https://github.com/netlify/all-scenarios"],
"agents": ["claude-code"],
"settings": {
"remotes": { "maxDepth": 2 }
}
}