Blog

Experiments, benchmarks, and deep dives from the Xanther team.

How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

We added architectural context to AI coding agents via MCP and tested on SWE-bench Verified. MiniMax M2.5 — a model that costs $0.02 per call — scored 78.2%, surpassing every model on the official mini-SWE-agent leaderboard, including Claude Opus 4.5 at 76.8%.

Xanther·May 1, 2026

Read →

DjangoDeep DiveOpen Source

Deep Dive: Using XCE to Navigate Django's 300K-Line Codebase

Django has 4,000+ files and a layered architecture that trips up even experienced contributors. We indexed the entire codebase with XCE and tested against 10 real GitHub issues — agents resolve issues faster, use 57% fewer tokens, and avoid wrong-file rabbit holes.

Xanther·May 7, 2026

Read →

Context EngineeringArchitectureDeveloper Tools

Context Engineering Is the Compass Your Coding Agent Needs

Coding agents are powerful ships sailing without a map. Context engineering gives them architectural awareness. Without it, even the best models waste tokens. With it, a cheap model outperforms an expensive one.

Xanther·May 7, 2026

Read →

TypeScriptDeep DiveLobeChat

Navigating LobeChat's 92K-Node TypeScript Codebase with XCE

LobeChat is a 92,000-node TypeScript monorepo. We indexed the entire codebase — 8,764 descriptions, 7,948 LLDs, 796 HLDs — and show how XCE helps agents navigate it in seconds instead of minutes.

Xanther·May 7, 2026

Read →

Blog

How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

Deep Dive: Using XCE to Navigate Django's 300K-Line Codebase

Context Engineering Is the Compass Your Coding Agent Needs

Navigating LobeChat's 92K-Node TypeScript Codebase with XCE

Ready to supercharge your coding agents?