SWE-bench Verified Results

Real benchmark data showing how PRAT-powered context transforms coding agent performance — across models and price points.

All on mini-swe-agent: Sonnet 4.0 + XCE 66% → 73.4% — older-gen beating raw Sonnet 4.6, reaching Opus-level at 76.8% with cascade hybrid

MiniMax M2.5 + XCE: 78.2% on SWE-bench Verified — beating Claude Opus at 76.8%, at 16x lower cost

Model Comparison

Model	Config	Resolve Rate	Oracle Rate	Cost / Instance
Sonnet 4.0 (baseline)	mini-swe-agent	66%	—	$1.50
Sonnet 4.0 + XCEXCE	Resolve@1	73.4%	76.8%	$1.20
Sonnet 4.6 (baseline)	mini-swe-agent	72%	—	$3.00
MiniMax M2.5 (baseline)	mini-swe-agent	75.8%	—	$0.30
MiniMax M2.5 + XCEXCE	SWE-bench Verified	78.2%	—	$0.22
Claude 4.5 Opus	Leaderboard	76.8%	—	$8.50

A $0.30/1K token model outperforms a $5/1K token model with XCE

MiniMax M2.5 + XCE achieves 78.2% — surpassing Claude 4.5 Opus at 76.8% on SWE-bench Verified.

Resolve Rate Comparison

Cost per Resolved Instance

XCE in Action — 8,427 Tool Calls Across 499 Instances

1,677

xce_search

1,608

xce_callers

1,612

xce_callees

1,017

xce_architecture

1,493

xce_impact

1,020

xce_trace

Avg steps (resolved)

Avg steps (unresolved)

100%

Instances used XCE

Performance by Repository — Model Comparison

Resolve rates across repositories for different models. XCE-augmented models (blue/violet) consistently outperform their baselines.

Repository Coverage — Radar View

Spider chart showing how XCE expands the performance envelope across all repositories. Larger area = better coverage.

Performance by Repository

Repository	Resolved	Rate	Avg Steps
Django	172/231	74.5%	59
SymPy	51/75	68%	75
Sphinx	25/44	56.8%	74
Matplotlib	23/34	67.6%	67
scikit-learn	21/32	65.6%	72
xarray	20/22	90.9%	162
pytest	14/19	73.7%	91
requests	8/8	100%	81

Case Studies — How XCE Guided the Agent

Real examples where the baseline agent failed but XCE-augmented agent resolved the issue — with actual queries and context.

Djangodjango__django-13128

Temporal subtraction with mixed DateTimeField/DurationField output_field

steps

$0.18

cost

Without XCE

Failed — baseline agent couldn't locate the CombinedExpression class or understand how output_field resolution works for temporal operations.

With XCE

Resolved — XCE returned the exact CombinedExpression class and the output_field resolution chain.

Three targeted XCE searches progressively narrowed from DateTimeField to output_field resolution to CombinedExpression, giving the agent the full picture of how Django resolves types in arithmetic expressions.

Djangodjango__django-16333

UserCreationForm save() doesn't call save_m2m()

steps

$0.09

cost

Without XCE

Failed — baseline agent found UserCreationForm but missed the save_m2m() call chain and the impact on related models.

With XCE

Resolved — XCE returned the UserCreationForm class and impact analysis showing 33 impacted nodes across 12 modules.

xce_search returned the UserCreationForm class definition. Then xce_impact on django/contrib/auth/forms.py revealed 33 impacted nodes across 12 modules, helping the agent understand the full blast radius before making the fix.

Djangodjango__django-13741

ReadOnlyPasswordHashField disabled attribute issue

steps

$0.06

cost

Without XCE

Failed — baseline agent searched broadly for password hash handling, wasting steps on unrelated auth code.

With XCE

Resolved in 40 steps at $0.06 — XCE returned the exact class and impact analysis.

A single xce_search for "ReadOnlyPasswordHashField" returned the exact class definition with its __init__ method showing the disabled=True default. xce_impact then confirmed 33 impacted nodes, giving confidence the fix was safe.

Djangodjango__django-12143

Admin changelist _get_edited_object_pks regex prefix issue

steps

$0.06

cost

Without XCE

Failed — baseline agent found the admin options file but couldn't locate the specific regex pattern causing the issue.

With XCE

Resolved in 39 steps at $0.06 — XCE returned the exact function with the regex pattern.

xce_search for "_get_edited_object_pks admin formset prefix" returned the exact function with the regex pattern that needed fixing. The agent immediately saw the re.escape(prefix) issue and fixed it.

Djangodjango__django-14855

get_admin_url for readonly ForeignKey with custom admin site

steps

$0.21

cost

Without XCE

Failed — baseline agent found the admin helpers but couldn't trace the URL generation chain for custom admin sites.

With XCE

Resolved — XCE returned the get_admin_url function showing the hardcoded "admin:" prefix that needed to use the custom site name.

xce_search returned the get_admin_url function in django/contrib/admin/helpers.py, clearly showing url_name = "admin:%s_%s_change" — the hardcoded "admin:" prefix was the bug. The agent saw it immediately and replaced it with the dynamic admin site name.

Djangodjango__django-11880

Form Field __deepcopy__ shares error_messages between instances

steps

$0.08

cost

Without XCE

Failed — baseline agent found __deepcopy__ in widgets.py but missed the error_messages sharing issue in the Field base class.

With XCE

Resolved — XCE returned the __deepcopy__ method with a full analysis explaining the mutable dictionary sharing bug.

xce_search returned the __deepcopy__ method AND a detailed analysis explaining how obj.attrs references the same dictionary as self.widget.attrs, causing error_messages to be shared between form instances.

Djangodjango__django-12858

models.E015 ordering lookup check incorrectly handles transforms

steps

$0.12

cost

Without XCE

Failed — baseline agent found _check_ordering but couldn't understand the full lookup resolution chain for transforms.

With XCE

Resolved — XCE returned the complete _check_ordering function with the full field traversal logic.

xce_search returned the entire _check_ordering function showing how Django validates ordering fields — including LOOKUP_SEP splitting, related field traversal, and the transform check. The agent could see exactly where the transform handling was missing.

Raw Data

Full transparency — trajectory and prediction data available for download.

Trajectory Data Prediction Data

SWE-bench Verified Results

Model Comparison

Resolve Rate Comparison

Cost per Resolved Instance

XCE in Action — 8,427 Tool Calls Across 499 Instances

Performance by Repository — Model Comparison

Repository Coverage — Radar View

Performance by Repository

Case Studies — How XCE Guided the Agent

Raw Data

Ready to supercharge your coding agents?