· technical  · 3 min read

How I Design AI Systems So They Don't Turn into Overgrown Frankensteins

DRAFT

Outline

Hook: AI systems are like gardens—without careful design and pruning, they turn into overgrown messes where nothing works and nobody knows why. I’ve built enough Frankensteins to know: good AI architecture is 80% restraint, 20% cleverness.

Core Argument: Most AI systems fail not from lack of capability, but from lack of discipline. The key to maintainable AI systems is: clear boundaries, explicit contracts, observable behavior, and resisting the urge to make everything “smart.”

Key Sections:

  1. The AI Complexity Death Spiral

    • How AI systems grow out of control: “just add another prompt”
    • The context accumulation problem: each feature adds dependencies
    • Chain-of-prompts becomes spaghetti code
    • When debugging requires asking the AI why it did something
    • Real example: A chatbot that became unmaintainable in 3 months
  2. Design Principle #1: Single Responsibility AI Modules

    • Each AI component does ONE thing well
    • Example: Separate modules for classification, generation, summarization
    • Benefits: testable, debuggable, replaceable
    • Anti-pattern: One mega-prompt that does everything
    • Code example: Good vs. bad module structure
  3. Design Principle #2: Explicit Contracts and Schemas

    • Define input/output schemas strictly (Pydantic, TypeScript types)
    • AI can’t return arbitrary JSON—must match schema
    • Validation at every boundary
    • Makes failures obvious and catchable
    • Example: Structured output from OpenAI function calling
  4. Design Principle #3: The “Dumb Pipe” Architecture

    • AI should be stateless components in a clear data pipeline
    • Traditional code handles routing, state, business logic
    • AI handles: parsing, generation, classification, embedding
    • Don’t let AI make architectural decisions
    • Diagram: Clean pipeline vs. AI-everywhere mess
  5. Design Principle #4: Observability > Cleverness

    • Log every AI call: input, output, tokens, latency, cost
    • Make the AI’s “thinking” visible (show reasoning, confidence)
    • Store examples of good/bad outputs for debugging
    • Build dashboards: success rates, failure patterns, edge cases
    • Tool recommendations: Langfuse, LangSmith, custom Streamlit
  6. Design Principle #5: Graceful Degradation

    • AI will fail—design for it
    • Fallback strategies: retry with simpler prompt, use default, ask human
    • Never let AI failure crash the whole system
    • Example: 99 Minds transcription fallback chain
  7. The Testing Challenge

    • Unit tests: Test deterministic parts, mock AI calls
    • Integration tests: Use fixed AI responses (VCR pattern)
    • Eval sets: Track AI output quality over time
    • Human-in-the-loop: Spot check real outputs weekly
    • Version control for prompts: treat them like code
  8. When to Say No to AI

    • If a rule-based system works, use it
    • If the task requires 100% accuracy, AI isn’t ready
    • If you can’t explain failures to users, don’t ship it
    • If the system works without AI, don’t add it for “coolness”
    • Real example: Choosing cron job over AI for scheduling

Examples/Stories:

  • 99 Minds architecture: Clean separation of voice → transcription → embedding → storage
  • Law firm tool: Started messy, refactored to single-responsibility modules
  • Personal RAG system: Observable pipeline with monitoring dashboard
  • Failure story: Early chatbot that mixed state management with AI, became unmaintainable
  • Cost surprise: Logging saved $500/month by catching inefficient prompts

Takeaways:

  • Treat AI like unreliable microservices: clear contracts, handle failures
  • The best AI systems are 70% traditional code, 30% AI
  • Make AI behavior observable—you can’t fix what you can’t see
  • Start with the simplest thing that works, add AI surgically
  • Maintain ability to explain and debug every decision path

Cross-Links:

  • ← “Stop Asking ‘What Can AI Do?‘” (Series 1-5)
  • → “Why Your AI Agent Sucks at Context” (Series 1-8)
  • → “Build Once, Leverage Forever” (Series 2-20)
  • → “Your MVP Is Trash” (Series 2-11)
Back to Blog