Configuration Reference

Complete reference for axis.config.{json|js|mjs|ts}.

Full Example

AXIS is configured via an axis.config.* file in your project root. JSON is the default; JavaScript and TypeScript configs are also supported and let you compose your config programmatically. Here is a JSON example showing all available fields:

{
  "scenarios": "./scenarios",
  "agents": [
    "claude-code",
    {
      "agent": "gemini",
      "model": "gemini-2.5-pro",
      "scenarios": ["cms/*"],
      "flags": { "yolo": true }
    }
  ],
  "settings": {
    "concurrency": 4,
    "scoring_weights": {
      "goal_achievement": 0.4,
      "environment": 0.2,
      "service": 0.2,
      "agent": 0.2
    },
    "limits": {
      "run": { "time_minutes": 60, "tokens": 2000000 },
      "scenario": { "time_minutes": 10, "tokens": 200000 }
    }
  },
  "env": ["ANTHROPIC_API_KEY", "GEMINI_API_KEY"],
  "mcp_servers": {
    "filesystem": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    }
  },
  "judging": {
    "agents": ["claude-code", "codex"]
  },
  "skills": ["./skills/deploy"],
  "adapters": {
    "my-agent": "./adapters/my-agent.ts"
  },
  "beforeAll": [
    { "action": "run_script", "command": "docker compose up -d test-db" }
  ],
  "afterAll": [
    { "action": "run_script", "command": "echo \"done: $AXIS_COMPLETED/$AXIS_TOTAL (report: $AXIS_REPORT_DIR)\"" }
  ]
}

Config File Formats

AXIS resolves the config file by extension, in priority order: axis.config.ts → axis.config.js → axis.config.mjs → axis.config.json. Use --config <path> to point at a specific file.

Extension	Loader	Notes
`.json`	Native JSON parse	Static config, no executable logic.
`.js` / `.mjs` / `.cjs`	Native dynamic import	ESM module. Default export is the config object or a function returning one.
`.ts` / `.mts` / `.cts`	Loaded via jiti	No build step needed. Type-only imports are stripped at runtime.

JavaScript / TypeScript configs

JS and TS configs let you build the config programmatically. Useful for sharing logic across scenarios, deriving values from environment variables, or generating large numbers of scenarios from a fixture set. The module's default export must be either the config object directly or a (sync or async) function that returns one:

// axis.config.ts
import type { AxisConfig, InlineScenario } from "@netlify/axis";
import applyLimits from "./scenarios/apply-limits.js";
import authorScenario from "./scenarios/author-scenario.js";

const dynamicScenarios: InlineScenario[] = ["alpha", "beta", "gamma"].map((id) => ({
  key: "smoke-" + id,
  name: "Smoke test " + id,
  prompt: "Do thing for " + id,
  rubric: [{ check: "Did the thing" }],
}));

export default {
  scenarios: [
    "./scenarios",
    applyLimits,
    authorScenario,
    ...dynamicScenarios,
  ],
  agents: ["claude-code"],
  settings: {
    limits: { run: { time_minutes: 60 } },
  },
} satisfies AxisConfig;

Or as a function (sync or async):

// axis.config.ts
import type { AxisConfig } from "@netlify/axis";

export default async () => {
  const fixtures = await loadFixtures();
  const config: AxisConfig = {
    scenarios: fixtures.map(buildScenario),
    agents: ["claude-code"],
  };
  return config;
};

Generating an axis.config file

axis init --format ts (or --format js) scaffolds a typed config file alongside a sample JSON scenario. Without --format, AXIS produces a .json config to preserve back-compat.

Top-Level Fields

Field	Type	Required	Description
`scenarios`	`string \| (string \| InlineScenario)[]`	No	A path to the scenarios directory, or an array of paths and/or inline scenario objects. Inline entries must include a `key`; entries loaded from files take their `key` from the file path. Defaults to `"./scenarios"` when omitted. Array entries may also be git repo URLs; see Remote scenarios. See Authoring scenarios for the full schema.
`agents`	`(string \| AgentConfig)[]`	Yes	Agent names or full agent configurations.
`settings`	`object`	No	Concurrency, scoring weight, and limit overrides.
`env`	`string[]`	No	Additional environment variables to pass through to agent processes.
`mcp_servers`	`object`	No	MCP servers available to all agents.
`judging`	`object`	No	Precedence-ordered list of judge agents for scoring. See Judging Agents. When omitted, each run is judged by its own agent.
`skills`	`string[]`	No	Skills available to all agents.
`adapters`	`object`	No	Custom agent module paths, keyed by agent name.
`artifacts`	`string[]`	No	Glob patterns of files to capture from each scenario's workspace after teardown. Merged with per-scenario `artifacts`.
`beforeAll`	`LifecycleAction[]`	No	Lifecycle actions that run once before any scenarios start. See Run-Level Lifecycle.
`afterAll`	`LifecycleAction[]`	No	Lifecycle actions that run once after every scenario has been scored and the report is finalized. See Run-Level Lifecycle.

Run-Level Lifecycle

beforeAll and afterAll are run-level counterparts to a scenario's setup and teardown: they fire once per run rather than once per scenario. Use them to spin up shared infrastructure before any agents start, or to upload the final report and send a completion notification after everything is scored.

Both fields accept the same lifecycle action types as scenario hooks (run_script and copy), and scripts run with the config directory as their working directory.

{
  "beforeAll": [
    { "action": "run_script", "command": "docker compose up -d test-postgres" }
  ],
  "afterAll": [
    { "action": "run_script", "command": "./scripts/notify-slack.sh" },
    { "action": "run_script", "command": "docker compose down" }
  ]
}

A typical afterAll script can use the AXIS_* environment variables below to assemble a summary message:

#!/usr/bin/env bash
# scripts/notify-slack.sh
curl -X POST "$SLACK_WEBHOOK" \
  -H 'Content-Type: application/json' \
  -d "{\"text\": \"AXIS run: $AXIS_COMPLETED/$AXIS_TOTAL passed in $AXIS_DURATION_MS ms (report: $AXIS_REPORT_DIR)\"}"

Run-Level Lifecycle Details

Hooks fire from the axis CLI only; the programmatic run() API does not invoke them. Library users own their own orchestration.
beforeAll runs before the report directory is created. A failure (non-zero exit) aborts the entire run with no report on disk.
afterAll runs after every scenario has been scored and the report is finalized, so $AXIS_REPORT_DIR/report.json is readable. A failure causes a non-zero CLI exit but does not erase the report.
Both hooks honour the per-action 3-minute timeout. Each action runs sequentially; the first non-zero exit aborts the phase.

Run-level lifecycle environment variables

Both phases get the shared $AXIS_OUTPUT markdown sink and a AXIS_PHASE discriminator (beforeAll or afterAll). afterAll additionally receives summary stats and the path to the finalized report:

Variable	Phase	Value
`AXIS_PHASE`	Both	Either `beforeAll` or `afterAll`.
`AXIS_OUTPUT`	Both	Path to a per-phase markdown file. Anything written here surfaces in the CLI log.
`AXIS_REPORT_DIR`	`afterAll`	Absolute path to the just-written `.axis/reports/{reportId}/` directory. `report.json`, `report.html`, and the per-scenario JSON files are all on disk by the time this script runs.
`AXIS_TOTAL`	`afterAll`	Number of jobs executed (agent × scenario combinations).
`AXIS_COMPLETED`	`afterAll`	Number of jobs that finished successfully.
`AXIS_FAILED`	`afterAll`	Number of jobs that failed.
`AXIS_DURATION_MS`	`afterAll`	Total run duration in milliseconds.

Agent Configuration

Each entry in the agents array can be a simple string (agent name with defaults) or a full configuration object.

Field	Type	Required	Description
`agent`	`string`	Yes	Agent name: `claude-code`, `codex`, `gemini`, `goose`, etc.
`model`	`string`	No	Model override passed to the agent CLI.
`scenarios`	`string[]`	No	Subset of scenarios to run. Supports glob patterns like `cms/*`.
`skills`	`string[]`	No	Agent-specific skills (merged with top-level skills).
`flags`	`object`	No	CLI flags passed to the agent, e.g. `{"full-auto": true}`.
`command`	`string`	No	Custom CLI command (for custom agents).

Scoring Weights

Override the default dimension weights under settings.scoring_weights. Values must sum to 1.0. See Scoring Framework for what each dimension measures.

Field	Type	Required	Description
`goal_achievement`	`number`	No	Goal Achievement weight. Default: `0.4`.
`environment`	`number`	No	Environment weight. Default: `0.2`.
`service`	`number`	No	Service weight. Default: `0.2`.
`agent`	`number`	No	Agent weight. Default: `0.2`.

{
  "settings": {
    "scoring_weights": {
      "goal_achievement": 0.5,
      "environment": 0.2,
      "service": 0.2,
      "agent": 0.1
    }
  }
}

Limits

Limits control how much time and tokens a run or individual scenario can consume. This prevents runaway agents from consuming unbounded resources. Limits can be configured at three levels:

Overall run limits (settings.limits.run): when hit, all remaining and in-progress jobs are aborted.
Default scenario limits (settings.limits.scenario): default budget for each individual job. Only that job fails when exceeded.
Per-scenario limits (limits in the scenario JSON): override the defaults for a specific scenario. Only that job fails when exceeded.

Default behavior

Even without any limits configured, each scenario has a default time limit of 15 minutes. You can override this by setting settings.limits.scenario.time_minutes or by adding limits.time_minutes to individual scenarios.

Limit fields

Field	Type	Description
`time_minutes`	`number`	Maximum wall-clock time in minutes. Accepts fractional values (e.g. `0.5` for 30 seconds). Default: `15` per scenario.
`tokens`	`number`	Maximum total tokens (input + output + cache). Must be a positive integer. No default.

Overall run limits

Set settings.limits.run to cap the total time or tokens across the entire run. When an overall limit is reached, all remaining and currently running jobs are immediately terminated and marked as failed.

{
  "settings": {
    "limits": {
      "run": { "time_minutes": 60, "tokens": 2000000 }
    }
  }
}

Per-scenario limits

Set settings.limits.scenario to define default per-job budgets. These can be overridden by adding a limits field directly in a scenario file.

// axis.config.json: default for all scenarios
{
  "settings": {
    "limits": {
      "scenario": { "time_minutes": 10, "tokens": 200000 }
    }
  }
}

// scenarios/expensive-task.json: override for one scenario
{
  "name": "Expensive task",
  "prompt": "...",
  "rubric": "...",
  "limits": { "time_minutes": 30, "tokens": 500000 }
}

Token limit accuracy

Token limits are enforced using a conservative estimate during execution (based on streamed assistant text). The actual token count may slightly exceed the limit before the job is terminated. The authoritative token count from the agent's API is used for overall run limit tracking.

MCP Servers

Configure Model Context Protocol servers that are automatically wired into each agent environment. AXIS supports both stdio (local process) and HTTP (remote endpoint) servers.

Field	Type	Required	Description
`type`	`"stdio" \| "http"`	Yes	Server transport type.
`command`	`string`	Yes	Command to start the server process (stdio only).
`args`	`string[]`	No	Arguments passed to the command.
`env`	`object`	No	Environment variables for the server process.
`url`	`string`	Yes	Remote server endpoint URL (http only).
`headers`	`object`	No	HTTP headers (supports `${VAR}` env interpolation).

{
  "mcp_servers": {
    "filesystem": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
      "env": { "LOG_LEVEL": "info" }
    },
    "remote-api": {
      "type": "http",
      "url": "https://mcp.example.com/tools",
      "headers": { "Authorization": "Bearer ${TOKEN}" }
    }
  }
}

Each NDJSON-style agent writes MCP configuration in its native format before spawning. ACP-based adapters (claude-sdk, codex-sdk, gemini, and every other ACP agent) pass MCP servers through the ACP session/new call instead.

Agent	Config File	Location
`claude-code`	`.mcp.json`	Workspace root
`codex`	`config.toml`	`CODEX_HOME`

Judging Agents

By default, every run is judged by the same agent that produced it, so the agent under test scores its own work. Set judging.agents to a precedence-ordered list of judge candidates. For each run, AXIS picks the first entry whose adapter name differs from the run's own agent so a fresh perspective evaluates the work. If every entry matches the run's own agent, the first entry is used.

Each entry accepts the same shorthand as agents: a string (agent name) or a full AgentConfig object with model and adapter flags. Every candidate judge must be installed and have any required environment variables set; AXIS validates this during pre-flight before any jobs run.

{
  "agents": ["claude-code", "codex", "gemini"],
  "judging": {
    "agents": ["claude-code", "codex"]
  }
}

With the config above, runs by claude-code are judged by codex (first entry whose adapter differs), and runs by codex or gemini are judged by claude-code. Pin a specific model on any candidate by passing the full AgentConfig form:

{
  "judging": {
    "agents": [
      { "agent": "claude-code", "model": "opus" },
      "codex"
    ]
  }
}

Skills

Skills extend agent capabilities with reusable instruction sets. Specify them at the top level (shared across all agents), per agent, or per scenario.

Format	Example	Description
Local path	`./skills/deploy`	Relative to the config file.
GitHub shorthand	`netlify/axis-skill-deploy`	`owner/repo` format, cloned automatically.
Full URL	`https://github.com/owner/repo`	GitHub repository URL, cloned automatically.

{
  "skills": [
    "./skills/deploy",
    "netlify/axis-skill-deploy",
    "https://github.com/owner/repo"
  ]
}

Remote skills are cached in .axis/skills-cache/. Use --refresh-skills to force re-clone.

Environment Variables

The env field lists additional environment variables to pass through to agent processes. The following are always passed through by default:

Category	Variables
API keys	`ANTHROPIC_API_KEY`, `CODEX_API_KEY`, `GEMINI_API_KEY`
System	`PATH`, `USER`, `SHELL`, `LANG`, `TERM`, `TMPDIR`

{
  "env": ["MY_CUSTOM_TOKEN", "DATABASE_URL"]
}

Scenarios

Scenarios live in the configured scenarios directory as .json, .js, or .ts files, or are listed inline in axis.config.{js,ts}. The filename (without extension) becomes the scenario key, and nested directories create namespaced keys: scenarios/cms/create-post.ts maps to cms/create-post.

Scenarios can define variants to run the same task under different configurations (skills, MCP servers, prompts, etc.) without duplicating files. Each variant produces a separate job with a key like create-post@variant-name.

See Writing Scenarios for the complete scenario schema, the authoring formats, rubric design guidance, and examples.

Remote Scenarios

Remote scenarios let one AXIS project pull a scenario library straight out of another git repository, instead of vendoring or copy-pasting scenario files. A team can publish a canonical set of scenarios (with their setup scripts, fixtures, MCP servers, and skills) once, and every downstream project consumes it by listing the repo URL in its scenarios array. When the upstream library changes, the next AXIS run picks it up automatically; there's no version to bump and no files to re-sync.

This is especially useful for:

Org-wide scenario libraries. A platform team curates the scenarios that represent "what agents on our stack actually need to do," and every product team's config references that library to benchmark their own agent against the same yardstick.
Sharing a benchmark across forks. A reference repo defines the agreed-on scenarios; multiple agent implementations point at it to publish comparable AXIS Results.
Decoupling scenario authoring from agent testing. The team that knows the product writes the scenarios in their repo; the team running the agent only owns its axis.config.* and the agent it's testing.

Using a remote scenarios repo

Add a git repository URL to the scenarios array. Local paths and remote URLs can be mixed freely.

{
  "scenarios": [
    "./scenarios",
    "https://github.com/netlify/all-scenarios"
  ],
  "agents": ["claude-code"]
}

On each run AXIS clones the repo into .axis/remotes/<reversed-host>/<owner>/<repo>/, reads that repo's axis.config.*, and inlines its scenarios entries into the parent, resolved to absolute paths inside the clone. Inline scenario objects from a remote repo are passed through unchanged. If the cloned repo has no axis.config.* at its root, the whole repo is walked as a scenarios directory (equivalent to listing the clone path directly).

From here on, the runner behaves as if every scenario had been local from the start: discovery, filtering, lifecycle, scoring, and reporting are all unchanged.

Supporting config that comes with the scenarios

Remote scenarios usually depend on more than just their own files. Their setup scripts need certain env vars exported, they expect specific MCP servers to be configured, they rely on shared skills, and so on. To avoid forcing every parent project to re-declare all of that, AXIS folds a few supporting fields from the remote repo's axis.config.* into the parent config. The parent always wins on collisions.

Field	Merge semantics
`env`	Set union of var names; parent first then remote.
`mcp_servers`	Keyed merge; the parent's value wins when both declare the same server name.
`skills`	Ordered union with dedup, parent first. Local-path entries from the remote (`./...`) are rewritten to absolute paths inside the clone directory; URL and `owner/repo` shorthand entries pass through unchanged.
`artifacts`	Glob patterns concatenated and deduped, parent first.
`adapters`	Keyed merge with parent precedence; remote module paths are rewritten to absolute paths inside the clone directory so the remote adapter loads correctly.

Other top-level fields from the remote repo are ignored: agents, settings, judging, beforeAll, afterAll, and name. These belong to the parent project: it decides which agents to test, how to score them, and what run-level lifecycle to fire.

Freshness and caching

AXIS always runs git pull --ff-only on each invocation when the clone already exists, and does a shallow git clone the first time. There is no opt-in caching flag. The trade-off favours always-fresh scenario libraries over offline runs.

Dependencies

Remote scenarios authored as .ts/.js modules often import workspace helpers and external packages. If the cloned repo has a package.json and no node_modules/, AXIS runs npm install (or pnpm install / yarn install based on the lockfile) automatically before walking. Install failures are logged but do not abort the run; modules whose imports fail are reported and skipped.

Nested remote references

By default, a remote repo's scenarios may not itself list further remote URLs; AXIS errors out with the offending URL named. Increase settings.remotes.maxDepth to allow nesting. Cycles (A → B → A) are always rejected regardless of depth.

{
  "scenarios": ["./scenarios", "https://github.com/netlify/all-scenarios"],
  "agents": ["claude-code"],
  "settings": {
    "remotes": { "maxDepth": 2 }
  }
}