The-Agency/docs/legacy/FINAL_PARITY_AUDIT.md

16 KiB

Final Parity Audit: Dart CLI vs TypeScript Codebase

Date: 2026-04-04
Auditor: Fresh code inspection (NOT prior reports)
Methodology: Line-by-line code analysis + execution path tracing
Verdict Rule: Stubbed/simulated/placeholder code = NOT parity. Code must be functional, not just present.


Executive Summary

Metric Value
True Parity (Real, Integrated) ~20%
Skeleton Code (Framework exists, unfilled) ~35%
Stubbed/Simulated (Looks real, actually mocked) ~30%
Completely Missing ~15%

Honest Assessment: This Dart implementation is a partially-filled skeleton. Core file/bash tools work. Permission system is real. But most "features" are either stubbed (mock responses), incomplete (API wiring missing), or vendor-specific (Anthropic defaults remain).


1. Core File & Bash Tools — REAL

Status: Full functional parity
Files:

  • lib/src/tools/bash_tool.dart — Real subprocess execution
  • lib/src/tools/glob_tool.dart — Real glob pattern matching
  • lib/src/tools/grep_tool.dart — Real regex search with ripgrep semantics
  • lib/src/tools/file_read_tool.dart — Real file I/O
  • lib/src/tools/file_write_tool.dart — Real file I/O
  • lib/src/tools/file_edit_tool.dart — Real file manipulation

What works:

  • File operations execute immediately and correctly
  • Bash commands run in real subprocess with proper exit codes
  • Glob/grep semantics match old_repo behavior
  • Permission system checks apply before execution

Evidence:

  • bash_tool.dart:48-65: Real Process.run() call with output capture
  • grep_tool.dart:85-110: Real ripgrep invocation via Platform.isWindows detection
  • All tools inherit from BaseTool with execute() returning Future<String>

Gap: None. These are complete parity.


2. Permission System — REAL

Status: Full functional parity
Files:

  • lib/src/permissions/permission_manager.dart
  • lib/src/tools/tool_registry.dart (lines 60-84: permission checking)

What works:

  • All legacy modes supported: acceptEdits, auto, bubble, bypassPermissions, default, dontAsk, plan
  • Tool safety classification (high/medium/low)
  • Rule parsing supports domain:example.com, Tool(args) syntax
  • Integration: ToolRegistry.execute() checks permissions before running any tool

Evidence:

  • tool_registry.dart:54-90: Permission check wraps every tool execution
  • local_state.dart:36-44: All 7 permission modes recognized
  • Safe tools auto-allowed in auto mode; unsafe tools require confirmation

Gap: None in core logic. Full parity.


3. API Types & Message Handling — REAL

Status: Full parity
Files:

  • lib/src/api/api_types.dart

What works:

  • ApiMessage class with support for both Anthropic and OpenRouter formats
  • Proper field extraction: input_tokens, output_tokens, web_search_requests, web_fetch_requests
  • Handles both Anthropic (stop_reason) and OpenAI (finish_reason) conventions
  • MessageRequest and TextBlock, ToolUse, ToolResult classes complete

Evidence:

  • Lines 127-184: ApiMessage.fromJson() handles both API formats
  • Lines 186-291: ApiMessage.fromOpenRouterResponse() parses OpenRouter format
  • Usage extraction (lines 128-138) tries both Anthropic and OpenAI field names

Gap: None. Types are complete and work with multiple API providers.


4. Vendor-Neutral Constants — REAL (but incomplete wiring)

Status: Partial parity
Files:

  • lib/src/constants.dart — Vendor-neutral abstraction layer
  • lib/src/api/api_client.dart — Provider detection

What's implemented:

  • kHostEndpoint constant for remote service override
  • areRemoteServicesAvailable() check
  • ApiProvider enum with 6 providers (generic, anthropic, openrouter, bedrock, vertex, foundry)
  • Environment variable detection for vendor selection (USE_OPENROUTER, USE_ANTHROPIC, etc.)
  • ApiPaths class with vendor-neutral paths
  • API endpoint resolution

What's NOT wired:

  • No actual API calls to remote services (see API Integration section below)
  • model_cost.dart is empty — no pricing data loaded
  • resolveBaseUrl() defaults to hardcoded "https://api.anthropic.com" (line 70) ANTHROPIC-SPECIFIC DEFAULT

Honest assessment: Scaffolding exists. Wiring is incomplete. Still vendor-specific defaults.


5. Analytics & Usage Tracking — SKELETON

Status: Framework implemented, but non-functional
Files:

  • lib/src/services/analytics_service.dart (291 lines)
  • lib/src/services/usage_tracker.dart (395 lines)

What exists:

  • AnalyticsService singleton with event buffering
  • UsageTracker singleton with quota limits
  • Integration into ToolRegistry.execute() (lines 92-101)
  • Wiring in app.dart (unused, just instantiated)

What actually happens:

  • Events are logged to in-memory buffer
  • No remote sync implemented (line 57 in usage_tracker.dart checks shouldUseRemoteService('usage') but does nothing)
  • Quota checks exist but never block execution
  • File I/O for persistence is stubbed (_loadEventBuffer(), _saveEventBuffer() etc. — not shown, likely no-ops)

Honest assessment: Skeleton only. Not functional without external backend.


6. Web Tools: WebSearch & WebFetch — REAL HTTP, but untested

Status: Real HTTP implementation, unknown if working end-to-end
Files:

  • lib/src/tools/web_search_tool.dart (336 lines)
  • lib/src/tools/web_fetch_tool.dart (863 lines)

WebSearchTool — REAL implementation:

  • Lines 36-49: Real OpenRouter API call via HttpClient
  • Lines 52-124: Real HTTP POST to https://openrouter.ai/api/v1/chat/completions
  • Lines 126-328: Real response parsing, annotation extraction, source formatting
  • Requires valid OpenRouter API key

WebFetchTool — REAL HTTP + HTML parsing:

  • Lines 267-349: Real HttpClient request with redirect handling (up to 10 redirects)
  • Lines 390-442: Real HTML parsing via package:html (DOM extraction, markdown conversion)
  • Lines 585-636: Real OpenRouter API call to summarize fetched content
  • Lines 689-703: Real preapproved hosts list (platform.claude.com, docs.python.org, etc.)

What's missing:

  • No test coverage — these tools work in theory but not proven in practice
  • Requires external API (OpenRouter)
  • Cache implementation (lines 663-687) appears functional but untested

Honest assessment: REAL HTTP code. Probably works. But untested in this codebase.


7. Model Integration — MISSING

Status: No parity
Files:

  • lib/src/api/openrouter_client.dart (partial, see below)

What's missing:

  • No actual message API calls
  • openrouter_client.dart exists but createMessage() not in code read
  • ToolLoopService class exists (tool_loop_service.dart) but requires OpenRouterClient which is incomplete
  • No conversation history wired to model
  • No tool loop execution (model ↔ tools ↔ model cycle)

Remains Anthropic-specific:

  • Tool definitions in tool_loop_service.dart reference Claude-specific tool names
  • System prompt mentions Claude

Honest assessment: Model integration does not exist. REPL cannot work without this.


8. Task Tool — STUBBED

Status: Demo only
Files:

  • lib/src/tools/task_tool.dart (177 lines)

What it claims:

  • Create, list, get, update, stop background tasks

What it actually does:

  • In-memory map only (line 15: static final Map<String, Map<String, dynamic>> _tasks = {})
  • No process management
  • No task persistence
  • Comment on line 14: "In-memory task storage (would be persisted in full implementation)"

Honest assessment: Completely stubbed. Not parity.


9. Skill Tool — STUBBED

Status: File reader only, not execution engine
Files:

  • lib/src/tools/skill_tool.dart (232 lines)

What it claims:

  • Execute reusable skills (prompt templates)

What it actually does:

  • Reads .md files from ~/.claude/skills/
  • Parses YAML frontmatter
  • Returns skill content with template variable substitution (line 94)
  • No actual execution engine

Honest assessment: File browser masquerading as execution. Not parity.


10. MCP Tool — SIMULATED

Status: Mock responses only
Files:

  • lib/src/tools/mcp_tool.dart (240 lines)

What it claims:

  • Connect to MCP servers, list resources, read resources

What it actually does:

  • Returns hardcoded mock responses (lines 56-94: fake server list with status "connected")
  • No real MCP protocol implementation
  • Line 179-180: "Note: This is simulated MCP resource data. In a real implementation..."
  • Line 190-200: Fake server connection message

Honest assessment: 100% simulated. Not parity.


11. Agent Tools — SIMULATED

Status: Fake spawning only
Files:

  • lib/src/tools/agent_tool.dart (47 lines)
  • lib/src/tools/simple_agent_tool.dart (87 lines)

What they claim:

  • Spawn and coordinate AI agents

What they actually do:

  • AgentTool.execute() returns hardcoded response templates (lines 21-29)
  • Line 44: "Note: In a full implementation, this would spawn an actual AI agent."
  • No actual agent spawning
  • No agent coordination

Honest assessment: Mock-only. Not parity.


12. REPL/Interactive Mode — MISSING

Status: Does not exist
Evidence:

  • No interactive REPL shell
  • app.dart has command routing but no read-eval-print loop
  • Commands can be invoked with arguments but no free-form prompt
  • Old_repo has main.tsx with rich interactive UI, input prompts, streaming responses

Honest assessment: Does not exist. CRITICAL gap.


13. Command System — PARTIAL

Status: 73 commands implemented, ~25 missing, no REPL
Files: lib/src/app.dart (command catalog)

What works:

  • Command routing and help system
  • Basic command implementations for file ops, permissions, settings
  • Model/API commands exist but not fully wired

What's missing:

  • REPL mode (free-form prompt execution)
  • 25+ commands from legacy system
  • Complex commands that depend on REPL or model integration

Honest assessment: Partial. Framework exists. REPL blocks further progress.


Critical Blockers for Further Parity

  1. No REPL implementation — Cannot have interactive model interaction without REPL
  2. No model API wiring — Tool loop service exists but not connected to model
  3. No real task management — Task tool is in-memory only
  4. No real MCP protocol — MCP tool is 100% mocked
  5. Anthropic defaults remainapi_client.dart line 70 hardcodes api.anthropic.com

Subsystem-by-Subsystem Breakdown

Subsystem Status Real Partial Stubbed Missing
File I/O Full Parity
Bash/Process Full Parity
Glob/Grep Full Parity
Permissions Full Parity
API Types Full Parity
Vendor Constants Partial Wiring
Analytics Skeleton Framework
WebSearch Real HTTP Untested
WebFetch Real HTTP Untested
Model Integration Missing
Task Management Stubbed
Skill System Stubbed
MCP Protocol Stubbed
Agent System Stubbed
REPL/Interactive Missing
Chat/Tool Loop Skeleton Exists Not wired
Commands Partial 73 cmds 25+ missing

Top 10 Real Parity Wins

  1. File operations — Read, write, edit, glob all work exactly like legacy
  2. Bash tool — Real subprocess execution with proper capture
  3. Grep/ripgrep — Semantics match old_repo exactly
  4. Permission system — All 7 modes implemented, real integration
  5. API message types — Handles both Anthropic and OpenRouter formats
  6. Vendor-neutral constants framework — Infrastructure for multi-provider support
  7. WebFetch HTML parsing — Real HTML→markdown conversion
  8. WebSearch implementation — Real OpenRouter API integration
  9. Tool registry — Core dispatch mechanism works correctly
  10. Settings/configuration — Permission rules, model selection, theme, etc. load correctly

Top 10 Remaining Parity Gaps

  1. No REPL shell — Interactive prompt mode missing entirely
  2. Model API not wired — Tool loop service exists but can't call any model
  3. Task tool is in-memory only — No process management, no persistence
  4. MCP protocol is 100% mocked — Cannot connect to real MCP servers
  5. Skill execution is file reading only — No actual skill engine
  6. Agent spawning is fake — No real agent coordination
  7. Anthropic defaults hardcodedapi.anthropic.com still in runtime path
  8. Model pricing data missingmodel_cost.dart is empty
  9. Chat tool loop not integrated — ToolLoopService exists but unused
  10. 25+ commands not ported — Missing: bridge, ant-trace, backfill, daemon, etc.

Parity Percentage Estimate

Method: Weighted by functional criticality

Category Weight Actual Contribution
Core tools (file/bash/grep) 15% 100% 15%
Permissions 10% 100% 10%
API integration 30% 0% 0%
Model/Chat loop 20% 0% 0%
Web tools 10% 70% 7%
Advanced tools (MCP/Tasks/Agents) 15% 5% 1%
TOTAL 100% 33%

Honest estimate: 33% parity (weighted by criticality)

If weighted by line count instead: ~40% (lots of skeleton code)

Reality check: Can you run the tool loop? No. Can you interact with the model? No. Can you use REPL? No. → Functionally much lower, maybe 15-20%.


Vendor Specificity Assessment

Remaining Anthropic-specific code in active paths:

  1. lib/src/api/api_client.dart:70 — Hardcoded https://api.anthropic.com default
  2. lib/src/tools/tool_loop_service.dart — Tool definitions reference Claude-specific names
  3. lib/src/app.dart — Model aliases include "opus", "sonnet", "haiku" (all Claude)
  4. OpenRouter is the fallback provider, not a first-class option

Vendor-neutral claim: FALSE. Still biased toward Anthropic.


Summary of Contradictions in Prior Reports

Claim Reality
"WebSearch/WebFetch are stubbed" FALSE — They have real HTTP code, just untested
"Full parity achieved" FALSE — REPL doesn't exist, model integration missing
"Vendor-neutral" FALSE — Anthropic defaults still in code
"Task tool implemented" FALSE — In-memory simulation only
"MCP integrated" FALSE — 100% mocked responses
"25% parity" Close, but should be 33% weighted by criticality

Recommendations for Final Code Fixes

  1. Remove Anthropic default from api_client.dart:70 — Use vendor-neutral logic or fail clearly
  2. Wire model integration — Connect ToolLoopService to actual model (OpenRouter or other)
  3. Implement REPL — Add interactive prompt loop in main
  4. Add integration tests — Prove WebSearch/WebFetch actually work with real API
  5. Consolidate reports — Delete PARITY_REPORT.md, IMPLEMENTATION_SUMMARY.md, parity_review.md, BRUTALLY_HONEST_PARITY_REPORT.md

Files to Update/Delete

Delete these outdated/contradictory reports:

  • PARITY_REPORT.md
  • IMPLEMENTATION_SUMMARY.md
  • BRUTALLY_HONEST_PARITY_REPORT.md
  • parity_review.md
  • CORRECTIVE_PASS_SUMMARY.md

Keep only:

  • FINAL_PARITY_AUDIT.md (this document)

Audit completed: 2026-04-04
Confidence level: High (code inspection + execution path analysis)
Next action: Fix hardcoded Anthropic default, wire model integration, implement REPL.