Project Scaffolding for AI Applications: Folder Structures That Scale

I once inherited an AI project where the entire RAG pipeline, the FastAPI routes, the database models, the prompt templates, and the LangChain chain definitions all lived in a single 2,400-line main.py file. The original developer had prototyped everything in one place ("just to get it working") and never refactored. Six months later, three team members were making changes to the same file in every pull request, merge conflicts were daily, and nobody could confidently modify the retrieval logic without risking a regression in the API layer.

A clear project layout is not about aesthetic preference. It communicates architecture at a glance, prevents coupling between components that should be independent, and makes onboarding a new team member a matter of reading a directory listing instead of deciphering a monolith. For AI projects specifically, the folder structure needs to account for components that traditional web applications do not have: prompt templates, embedding pipelines, agent definitions, vector store configurations, and evaluation datasets.

This guide presents three opinionated folder structures for the three most common AI application types: RAG applications, multi-agent systems, and API-backed AI services.

The RAG Application Layout

RAG applications have a distinctive data flow: documents come in, get processed and embedded, stored in a vector database, and retrieved at query time to augment LLM prompts. The folder structure should mirror this flow.

The core separation is between ingestion (everything that happens before a query) and retrieval (everything that happens at query time). The ingestion side includes document loaders, chunking strategies, and embedding pipelines. The retrieval side includes the search logic, prompt assembly, and LLM interaction. Mixing these in the same module creates coupling that makes it impossible to update your chunking strategy without touching your query endpoint.

A production RAG project typically has a src/ directory with subdirectories for ingestion/ (loaders, chunkers, embedders), retrieval/ (search, reranking, prompt augmentation), api/ (FastAPI routes and middleware), models/ (Pydantic schemas for requests, responses, and internal data), config/ (settings, constants, and environment management), and prompts/ (versioned prompt templates stored as separate files).

The prompts/ directory deserves special attention. Prompt templates are not code in the traditional sense, but they are deployable artifacts that change behavior. Storing them as separate files (YAML, Jinja2, or plain text) rather than inline string literals makes them versionable, reviewable in pull requests, and testable independently.

The tests/ directory mirrors the src/ structure: tests/ingestion/, tests/retrieval/, tests/api/. An additional tests/evaluation/ directory holds RAG-specific evaluation datasets and scripts.

The Multi-Agent System Layout

Multi-agent systems built with LangGraph or CrewAI have different organizational needs. The primary components are agent definitions, tool implementations, state schemas, and graph/orchestration logic.

The recommended structure separates agents/ (individual agent configurations and prompts), tools/ (reusable tool implementations that agents can invoke), graphs/ (LangGraph state machines or CrewAI crew definitions), and state/ (typed state schemas and checkpoint configurations).

The critical insight is that tools and agents should be independently testable. A tool that queries a database or calls an external API should work regardless of which agent invokes it. An agent's reasoning should be testable with mocked tool responses. This independence is only possible when the folder structure enforces the separation.

Blog illustration

The API-Backed AI Service Layout

For AI services that expose capabilities through a REST API (the most common production pattern), the folder structure combines web application conventions with AI-specific components.

The standard FastAPI layout applies: routers/ for endpoint definitions, services/ for business logic, models/ for Pydantic schemas, middleware/ for cross-cutting concerns (auth, logging, rate limiting). The AI-specific additions are llm/ (LLM client wrappers, model configuration, and provider abstraction), pipelines/ (multi-step processing chains), and evaluation/ (quality metrics and evaluation scripts).

The llm/ directory is particularly important. It abstracts the LLM provider behind a consistent interface, so switching from Claude to GPT-4o (or routing between them based on task complexity) requires changes in one place, not throughout the codebase. The provider abstraction pattern, where a base class defines the interface and concrete implementations handle provider-specific logic, prevents the vendor lock-in that makes future model migrations painful.

Cross-Cutting Patterns

Regardless of application type, several patterns apply universally.

Configuration at the root: The pyproject.toml, uv.lock, .env.example, CLAUDE.md, Dockerfile, and docker-compose.yml live at the project root. They are the first files a new team member reads and the first files CI/CD consumes.

Shared utilities in one place: A utils/ or common/ directory for logging configuration, date formatting, retry decorators, and other cross-cutting utilities prevents each module from implementing its own version of common functionality.

Data and evaluation directories: A data/ directory (git-ignored for large files, with a data/.gitkeep to preserve the structure) for local development datasets, and an evaluation/ directory for golden datasets, evaluation scripts, and benchmark results.

Scripts directory: A scripts/ directory for operational tasks: database migrations, index rebuilds, one-off data processing, and deployment helpers. These are not part of the application but are essential for operations.

What Happens Without Structure

The failure progression is predictable. The project starts as a single file. It grows to a few hundred lines and gets split into two or three modules based on whatever seemed logical at the time. New features get added wherever there is space. Within six months, the retrieval logic depends on the API layer, the API layer imports from the ingestion module, and the ingestion module references prompt templates that are defined in the service layer. Circular imports appear. Refactoring becomes a multi-day effort because every change cascades.

The cost of establishing a clean folder structure on day one is 30 minutes. The cost of untangling a spaghetti codebase six months later is days or weeks, plus the bugs introduced during the refactoring.

Key Takeaways

Project scaffolding for AI applications should separate ingestion from retrieval (for RAG), agents from tools (for multi-agent systems), and LLM abstraction from business logic (for API services). Store prompt templates as separate files for version control and independent testing. Mirror the source structure in your test directory. Establish the structure before writing the first line of application code. The 30 minutes you invest in scaffolding pays back exponentially in maintainability, onboarding speed, and the ability to modify one component without breaking another.

Follow Usama Nawaz for weekly deep dives on building production grade AI systems.