ImBenji 3588783001 Update project structure and enhance functionality with new features and dependencies

2026-04-14 03:31:29 +01:00

16 KiB

Raw Blame History

Final Parity Audit: Dart CLI vs TypeScript Codebase

Date: 2026-04-04
Auditor: Fresh code inspection (NOT prior reports)
Methodology: Line-by-line code analysis + execution path tracing
Verdict Rule: Stubbed/simulated/placeholder code = NOT parity. Code must be functional, not just present.

Executive Summary

Metric	Value
True Parity (Real, Integrated)	~20%
Skeleton Code (Framework exists, unfilled)	~35%
Stubbed/Simulated (Looks real, actually mocked)	~30%
Completely Missing	~15%

Honest Assessment: This Dart implementation is a partially-filled skeleton. Core file/bash tools work. Permission system is real. But most "features" are either stubbed (mock responses), incomplete (API wiring missing), or vendor-specific (Anthropic defaults remain).

1. Core File & Bash Tools — REAL ✅

Status: Full functional parity
Files:

lib/src/tools/bash_tool.dart — Real subprocess execution
lib/src/tools/glob_tool.dart — Real glob pattern matching
lib/src/tools/grep_tool.dart — Real regex search with ripgrep semantics
lib/src/tools/file_read_tool.dart — Real file I/O
lib/src/tools/file_write_tool.dart — Real file I/O
lib/src/tools/file_edit_tool.dart — Real file manipulation

What works:

File operations execute immediately and correctly
Bash commands run in real subprocess with proper exit codes
Glob/grep semantics match old_repo behavior
Permission system checks apply before execution

Evidence:

bash_tool.dart:48-65: Real Process.run() call with output capture
grep_tool.dart:85-110: Real ripgrep invocation via Platform.isWindows detection
All tools inherit from BaseTool with execute() returning Future<String>

Gap: None. These are complete parity.

2. Permission System — REAL ✅

Status: Full functional parity
Files:

lib/src/permissions/permission_manager.dart
lib/src/tools/tool_registry.dart (lines 60-84: permission checking)

What works:

All legacy modes supported: acceptEdits, auto, bubble, bypassPermissions, default, dontAsk, plan
Tool safety classification (high/medium/low)
Rule parsing supports domain:example.com, Tool(args) syntax
Integration: ToolRegistry.execute() checks permissions before running any tool

Evidence:

tool_registry.dart:54-90: Permission check wraps every tool execution
local_state.dart:36-44: All 7 permission modes recognized
Safe tools auto-allowed in auto mode; unsafe tools require confirmation

Gap: None in core logic. Full parity.

3. API Types & Message Handling — REAL ✅

Status: Full parity
Files:

lib/src/api/api_types.dart

What works:

ApiMessage class with support for both Anthropic and OpenRouter formats
Proper field extraction: input_tokens, output_tokens, web_search_requests, web_fetch_requests
Handles both Anthropic (stop_reason) and OpenAI (finish_reason) conventions
MessageRequest and TextBlock, ToolUse, ToolResult classes complete

Evidence:

Lines 127-184: ApiMessage.fromJson() handles both API formats
Lines 186-291: ApiMessage.fromOpenRouterResponse() parses OpenRouter format
Usage extraction (lines 128-138) tries both Anthropic and OpenAI field names

Gap: None. Types are complete and work with multiple API providers.

4. Vendor-Neutral Constants — REAL (but incomplete wiring)

Status: Partial parity
Files:

lib/src/constants.dart — Vendor-neutral abstraction layer
lib/src/api/api_client.dart — Provider detection

What's implemented:

kHostEndpoint constant for remote service override
areRemoteServicesAvailable() check
ApiProvider enum with 6 providers (generic, anthropic, openrouter, bedrock, vertex, foundry)
Environment variable detection for vendor selection (USE_OPENROUTER, USE_ANTHROPIC, etc.)
ApiPaths class with vendor-neutral paths
API endpoint resolution

What's NOT wired:

No actual API calls to remote services (see API Integration section below)
model_cost.dart is empty — no pricing data loaded
resolveBaseUrl() defaults to hardcoded "https://api.anthropic.com" (line 70) ❌ ANTHROPIC-SPECIFIC DEFAULT

Honest assessment: Scaffolding exists. Wiring is incomplete. Still vendor-specific defaults.

5. Analytics & Usage Tracking — SKELETON

Status: Framework implemented, but non-functional
Files:

lib/src/services/analytics_service.dart (291 lines)
lib/src/services/usage_tracker.dart (395 lines)

What exists:

AnalyticsService singleton with event buffering
UsageTracker singleton with quota limits
Integration into ToolRegistry.execute() (lines 92-101)
Wiring in app.dart (unused, just instantiated)

What actually happens:

Events are logged to in-memory buffer
No remote sync implemented (line 57 in usage_tracker.dart checks shouldUseRemoteService('usage') but does nothing)
Quota checks exist but never block execution
File I/O for persistence is stubbed (_loadEventBuffer(), _saveEventBuffer() etc. — not shown, likely no-ops)

Honest assessment: Skeleton only. Not functional without external backend.

6. Web Tools: WebSearch & WebFetch — REAL HTTP, but untested

Status: Real HTTP implementation, unknown if working end-to-end
Files:

lib/src/tools/web_search_tool.dart (336 lines)
lib/src/tools/web_fetch_tool.dart (863 lines)

WebSearchTool — REAL implementation:

Lines 36-49: Real OpenRouter API call via HttpClient
Lines 52-124: Real HTTP POST to https://openrouter.ai/api/v1/chat/completions
Lines 126-328: Real response parsing, annotation extraction, source formatting
Requires valid OpenRouter API key

WebFetchTool — REAL HTTP + HTML parsing:

Lines 267-349: Real HttpClient request with redirect handling (up to 10 redirects)
Lines 390-442: Real HTML parsing via package:html (DOM extraction, markdown conversion)
Lines 585-636: Real OpenRouter API call to summarize fetched content
Lines 689-703: Real preapproved hosts list (platform.claude.com, docs.python.org, etc.)

What's missing:

No test coverage — these tools work in theory but not proven in practice
Requires external API (OpenRouter)
Cache implementation (lines 663-687) appears functional but untested

Honest assessment: REAL HTTP code. Probably works. But untested in this codebase.

7. Model Integration — MISSING ❌

Status: No parity
Files:

lib/src/api/openrouter_client.dart (partial, see below)

What's missing:

No actual message API calls
openrouter_client.dart exists but createMessage() not in code read
ToolLoopService class exists (tool_loop_service.dart) but requires OpenRouterClient which is incomplete
No conversation history wired to model
No tool loop execution (model ↔ tools ↔ model cycle)

Remains Anthropic-specific:

Tool definitions in tool_loop_service.dart reference Claude-specific tool names
System prompt mentions Claude

Honest assessment: Model integration does not exist. REPL cannot work without this.

8. Task Tool — STUBBED ❌

Status: Demo only
Files:

lib/src/tools/task_tool.dart (177 lines)

What it claims:

Create, list, get, update, stop background tasks

What it actually does:

In-memory map only (line 15: static final Map<String, Map<String, dynamic>> _tasks = {})
No process management
No task persistence
Comment on line 14: "In-memory task storage (would be persisted in full implementation)"

Honest assessment: Completely stubbed. Not parity.

9. Skill Tool — STUBBED ❌

Status: File reader only, not execution engine
Files:

lib/src/tools/skill_tool.dart (232 lines)

What it claims:

Execute reusable skills (prompt templates)

What it actually does:

Reads .md files from ~/.claude/skills/
Parses YAML frontmatter
Returns skill content with template variable substitution (line 94)
No actual execution engine

Honest assessment: File browser masquerading as execution. Not parity.

10. MCP Tool — SIMULATED ❌

Status: Mock responses only
Files:

lib/src/tools/mcp_tool.dart (240 lines)

What it claims:

Connect to MCP servers, list resources, read resources

What it actually does:

Returns hardcoded mock responses (lines 56-94: fake server list with status "connected")
No real MCP protocol implementation
Line 179-180: "Note: This is simulated MCP resource data. In a real implementation..."
Line 190-200: Fake server connection message

Honest assessment: 100% simulated. Not parity.

11. Agent Tools — SIMULATED ❌

Status: Fake spawning only
Files:

lib/src/tools/agent_tool.dart (47 lines)
lib/src/tools/simple_agent_tool.dart (87 lines)

What they claim:

Spawn and coordinate AI agents

What they actually do:

AgentTool.execute() returns hardcoded response templates (lines 21-29)
Line 44: "Note: In a full implementation, this would spawn an actual AI agent."
No actual agent spawning
No agent coordination

Honest assessment: Mock-only. Not parity.

12. REPL/Interactive Mode — MISSING ❌

Status: Does not exist
Evidence:

No interactive REPL shell
app.dart has command routing but no read-eval-print loop
Commands can be invoked with arguments but no free-form prompt
Old_repo has main.tsx with rich interactive UI, input prompts, streaming responses

Honest assessment: Does not exist. CRITICAL gap.

13. Command System — PARTIAL ✅❌

Status: 73 commands implemented, ~25 missing, no REPL
Files: lib/src/app.dart (command catalog)

What works:

Command routing and help system
Basic command implementations for file ops, permissions, settings
Model/API commands exist but not fully wired

What's missing:

REPL mode (free-form prompt execution)
25+ commands from legacy system
Complex commands that depend on REPL or model integration

Honest assessment: Partial. Framework exists. REPL blocks further progress.

Critical Blockers for Further Parity

No REPL implementation — Cannot have interactive model interaction without REPL
No model API wiring — Tool loop service exists but not connected to model
No real task management — Task tool is in-memory only
No real MCP protocol — MCP tool is 100% mocked
Anthropic defaults remain — api_client.dart line 70 hardcodes api.anthropic.com

Subsystem-by-Subsystem Breakdown

Subsystem	Status	Real	Partial	Stubbed	Missing
File I/O	Full Parity	✅
Bash/Process	Full Parity	✅
Glob/Grep	Full Parity	✅
Permissions	Full Parity	✅
API Types	Full Parity	✅
Vendor Constants	Partial	✅	❌ Wiring
Analytics	Skeleton		❌ Framework
WebSearch	Real HTTP	✅			❌ Untested
WebFetch	Real HTTP	✅			❌ Untested
Model Integration	Missing				❌
Task Management	Stubbed			❌
Skill System	Stubbed			❌
MCP Protocol	Stubbed			❌
Agent System	Stubbed			❌
REPL/Interactive	Missing				❌
Chat/Tool Loop	Skeleton		❌ Exists		❌ Not wired
Commands	Partial		✅ 73 cmds		❌ 25+ missing

Top 10 Real Parity Wins

File operations — Read, write, edit, glob all work exactly like legacy
Bash tool — Real subprocess execution with proper capture
Grep/ripgrep — Semantics match old_repo exactly
Permission system — All 7 modes implemented, real integration
API message types — Handles both Anthropic and OpenRouter formats
Vendor-neutral constants framework — Infrastructure for multi-provider support
WebFetch HTML parsing — Real HTML→markdown conversion
WebSearch implementation — Real OpenRouter API integration
Tool registry — Core dispatch mechanism works correctly
Settings/configuration — Permission rules, model selection, theme, etc. load correctly

Top 10 Remaining Parity Gaps

No REPL shell — Interactive prompt mode missing entirely
Model API not wired — Tool loop service exists but can't call any model
Task tool is in-memory only — No process management, no persistence
MCP protocol is 100% mocked — Cannot connect to real MCP servers
Skill execution is file reading only — No actual skill engine
Agent spawning is fake — No real agent coordination
Anthropic defaults hardcoded — api.anthropic.com still in runtime path
Model pricing data missing — model_cost.dart is empty
Chat tool loop not integrated — ToolLoopService exists but unused
25+ commands not ported — Missing: bridge, ant-trace, backfill, daemon, etc.

Parity Percentage Estimate

Method: Weighted by functional criticality

Category	Weight	Actual	Contribution
Core tools (file/bash/grep)	15%	100%	15%
Permissions	10%	100%	10%
API integration	30%	0%	0%
Model/Chat loop	20%	0%	0%
Web tools	10%	70%	7%
Advanced tools (MCP/Tasks/Agents)	15%	5%	1%
TOTAL	100%		33%

Honest estimate: 33% parity (weighted by criticality)

If weighted by line count instead: ~40% (lots of skeleton code)

Reality check: Can you run the tool loop? No. Can you interact with the model? No. Can you use REPL? No. → Functionally much lower, maybe 15-20%.

Vendor Specificity Assessment

Remaining Anthropic-specific code in active paths:

lib/src/api/api_client.dart:70 — Hardcoded https://api.anthropic.com default
lib/src/tools/tool_loop_service.dart — Tool definitions reference Claude-specific names
lib/src/app.dart — Model aliases include "opus", "sonnet", "haiku" (all Claude)
OpenRouter is the fallback provider, not a first-class option

Vendor-neutral claim: FALSE. Still biased toward Anthropic.

Summary of Contradictions in Prior Reports

Claim	Reality
"WebSearch/WebFetch are stubbed"	FALSE — They have real HTTP code, just untested
"Full parity achieved"	FALSE — REPL doesn't exist, model integration missing
"Vendor-neutral"	FALSE — Anthropic defaults still in code
"Task tool implemented"	FALSE — In-memory simulation only
"MCP integrated"	FALSE — 100% mocked responses
"25% parity"	Close, but should be 33% weighted by criticality

Recommendations for Final Code Fixes

Remove Anthropic default from api_client.dart:70 — Use vendor-neutral logic or fail clearly
Wire model integration — Connect ToolLoopService to actual model (OpenRouter or other)
Implement REPL — Add interactive prompt loop in main
Add integration tests — Prove WebSearch/WebFetch actually work with real API
Consolidate reports — Delete PARITY_REPORT.md, IMPLEMENTATION_SUMMARY.md, parity_review.md, BRUTALLY_HONEST_PARITY_REPORT.md

Files to Update/Delete

Delete these outdated/contradictory reports:

PARITY_REPORT.md
IMPLEMENTATION_SUMMARY.md
BRUTALLY_HONEST_PARITY_REPORT.md
parity_review.md
CORRECTIVE_PASS_SUMMARY.md

Keep only:

FINAL_PARITY_AUDIT.md (this document)

Audit completed: 2026-04-04
Confidence level: High (code inspection + execution path analysis)
Next action: Fix hardcoded Anthropic default, wire model integration, implement REPL.

16 KiB Raw Blame History

Final Parity Audit: Dart CLI vs TypeScript Codebase

Executive Summary

1. Core File & Bash Tools — REAL ✅

2. Permission System — REAL ✅

3. API Types & Message Handling — REAL ✅

4. Vendor-Neutral Constants — REAL (but incomplete wiring)

5. Analytics & Usage Tracking — SKELETON

6. Web Tools: WebSearch & WebFetch — REAL HTTP, but untested

7. Model Integration — MISSING ❌

8. Task Tool — STUBBED ❌

9. Skill Tool — STUBBED ❌

10. MCP Tool — SIMULATED ❌

11. Agent Tools — SIMULATED ❌

12. REPL/Interactive Mode — MISSING ❌

13. Command System — PARTIAL ✅❌

Critical Blockers for Further Parity

Subsystem-by-Subsystem Breakdown

Top 10 Real Parity Wins

Top 10 Remaining Parity Gaps

Parity Percentage Estimate

Vendor Specificity Assessment

Summary of Contradictions in Prior Reports

Recommendations for Final Code Fixes

Files to Update/Delete

16 KiB

Raw Blame History