Agent Architecture — Tools, Memory, and Planning — AI Agents: From Concept to Production — Build What 95% of Developers Only Talk About

The Three Pillars (And The One Nobody Talks About)

Every AI agent, regardless of framework, consists of three systems: tools (how it acts), memory (what it knows), and planning (how it decides). There's a fourth that frameworks usually handle for you — orchestration — but understanding it separates hobbyists from people who ship.

This lesson maps out each system in detail. By the end, you'll be able to look at any agent framework and immediately understand its architecture — because they all implement the same patterns.

System 1: Tools — The Agent's Hands

Without tools, an agent is just a chatbot. Tools are functions the agent can call to interact with the outside world. They follow a strict contract:

# Every tool has three components:
# 1. A name the LLM can reference
# 2. A description the LLM reads to decide when to use it
# 3. A schema defining the expected input parameters

tool = {
    "name": "search_database",
    "description": "Search the product database by name or category. Returns up to 10 matching products with price and stock info.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search term"},
            "category": {"type": "string", "enum": ["electronics", "books", "clothing"]},
            "max_results": {"type": "integer", "default": 10}
        },
        "required": ["query"]
    }
}

The description is critical. The LLM uses it to decide when to call the tool and what arguments to pass. A vague description ("searches stuff") leads to wrong tool selection. A precise one ("Search the product database by name or category. Returns up to 10 matching products with price and stock info.") gives the LLM the context it needs.

Tool Design Principles

Single responsibility: One tool does one thing. search_and_update_database is two tools pretending to be one.
Clear return types: The LLM needs to know what it'll get back. "Returns a JSON object with keys: products (array), total_count (integer)" beats "Returns results."
Error information: Return error messages the LLM can reason about. {"error": "No products found matching 'quantum laptop'"} is useful. An exception trace is not.
Minimal permissions: A tool that can read the database should not also be able to delete rows. Separate read and write tools.

System 2: Memory — The Agent's Brain

Memory in AI agents operates on three timescales, and confusing them is one of the most common architectural mistakes:

Working Memory (Context Window)

This is the LLM's immediate context — the conversation history, system prompt, and tool results currently in the context window. It's fast, reliable, and limited. Even with 200K token windows, you'll hit the limit during complex tasks. When you do, the agent starts forgetting earlier steps.

The key constraint: working memory is expensive. Every token in the context is processed on every LLM call. A 100K token context with 10 tool-calling steps means processing ~1M tokens total. That's real money.

Short-Term Memory (Session State)

Information that persists within a task or session but isn't in the immediate context window. Frameworks like LangGraph store this as graph state — a structured object that travels with the agent through its execution flow.

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph
from operator import add

class AgentState(TypedDict):
    messages: Annotated[list, add]      # Conversation history
    search_results: list[dict]           # Results from search tool
    current_step: str                    # Where we are in the plan
    errors: list[str]                    # Accumulated errors
    attempts: int                        # Retry counter

This state is separate from the LLM context. You can store 10MB of search results in state without stuffing them into the prompt — you selectively inject only what's needed for the current step.

Long-Term Memory (Persistent Storage)

Information that survives across sessions. This is where vector databases, knowledge graphs, and traditional databases come in. An agent that remembers your codebase architecture from last week, your preferred coding style, or the results of a previous research task uses long-term memory.

We'll deep-dive into memory implementation in Lesson 5. For now, the key architectural insight: most agent failures trace back to memory problems. The agent forgot a tool result from 8 steps ago. It re-did work it already completed. It exceeded the context window and started hallucinating.

System 3: Planning — The Agent's Frontal Lobe

Planning is the hardest part of agent design and the part where LLMs are most unreliable. There are three major approaches:

No Plan (ReAct)

The agent takes one step at a time, deciding the next action based on the current state. This is the ReAct pattern from Lesson 1. Simple, works well for tasks under 5 steps, but falls apart on complex multi-step goals because the agent can't see the big picture.

Plan-Then-Execute

The agent generates a complete plan upfront, then executes it step by step. Better for complex tasks, but fragile — if step 3 fails, the remaining plan may be invalid. More advanced versions re-plan after each step.

# Plan-then-execute pattern
plan = llm.generate_plan(goal="Research and write a comparison of 3 Python web frameworks")
# plan = [
#   "Search for latest Python web framework benchmarks",
#   "Gather feature lists for FastAPI, Django, and Flask",
#   "Compare performance, community size, and learning curve",
#   "Write a structured comparison with recommendation"
# ]

for step in plan:
    result = agent.execute_step(step)
    if result.failed:
        plan = llm.replan(goal, completed_steps, result.error)

Hierarchical Planning

A "manager" agent creates high-level plans and delegates subtasks to specialized "worker" agents. This is the pattern behind CrewAI and multi-agent systems. More capable, more complex, and much harder to debug.

System 4: Orchestration — The Control Flow Nobody Sees

Orchestration is how you control the agent's execution flow. This is what separates LangGraph from a raw while loop:

State machines: Define explicit states (researching, writing, reviewing) with allowed transitions. The agent can only move forward in defined ways. LangGraph uses directed graphs for this.
Human-in-the-loop: Pause execution at critical points for human approval. "I'm about to send this email to 5,000 customers. Proceed?"
Error recovery: What happens when a tool call fails? Retry? Skip? Ask the user? Fall back to a different tool?
Timeouts and limits: Maximum steps, maximum cost, maximum time. Without these, a confused agent will loop forever, burning tokens and money.

Putting It Together: The Architecture Diagram

Here's how all four systems connect in a typical production agent:

┌─────────────────────────────────────────────────┐
│                ORCHESTRATION LAYER               │
│  (LangGraph / CrewAI / custom state machine)     │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ PLANNING │  │  MEMORY  │  │    TOOLS      │  │
│  │          │  │          │  │               │  │
│  │ ReAct /  │  │ Working  │  │ search_db()   │  │
│  │ Plan+Exe │  │ Short    │  │ send_email()  │  │
│  │ Hierarch │  │ Long     │  │ read_file()   │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                  │
│              ┌──────────────┐                    │
│              │   LLM CORE   │                    │
│              │ (GPT-4o /    │                    │
│              │  Claude /    │                    │
│              │  Gemini)     │                    │
│              └──────────────┘                    │
└─────────────────────────────────────────────────┘

Every framework you'll encounter — LangChain, LangGraph, CrewAI, AutoGen, Mastra — implements this same architecture with different APIs. Learn the pattern once, and you can pick up any framework in an afternoon.

The Decision That Matters Most

Before you write a single line of agent code, answer this: how much autonomy does this task actually need?

A customer support bot that routes tickets? L1 router. No planning needed.

A code review agent that reads PRs and leaves comments? L2 tool agent. ReAct pattern, a few tools, minimal memory.

A research agent that produces multi-page reports from dozens of sources? L3 planning agent with long-term memory.

A development team that researches, designs, codes, and tests? L4 multi-agent. CrewAI or similar.

Over-engineering the autonomy level is the #1 cause of agent projects failing. Start with the simplest architecture that could work. Upgrade when you hit a wall, not before.

In Lesson 3, you'll implement a real L2 agent with LangGraph — the most common production pattern. Tools, memory, and orchestration all working together.

Agent Architecture — Tools, Memory, and Planning

Lesson Notes