CODI Architecture

CODI is organised as a modular Python system with optional container runtimes. This document describes the architecture, how data flows through the system, and how each module contributes to deterministic Dockerfile optimisation.

High-Level System Diagram

Dockerfile + Context
        |
        v
+-------------------+
| Parser & Detector |
+-------------------+
        |
        v
+-------------------+
| Analyzer (smells) |
+-------------------+
        |
        v
+-------------------------+
| Renderer & Rules Engine |
+-------------------------+
        |
        +------------+
        |            |
        v            v
  +-----------+  +----------------+
  | Build Sim |  | Local LLM Assist|
  +-----------+  +----------------+
        |            |
        +------------+
               |
               v
      +------------------+
      | Reporter & Store |
      +------------------+
               |
               v
      Dashboards / API / CLI

The same pipeline powers both the CLI (cli/main.py) and the FastAPI service (api/server.py).

Deployments

Local Environment

Slim Container

Complete Container

Core Modules

Parser (core/parse.py)

Stack Detector (core/detect.py)

Analyzer (core/analyzer.py)

CMD Parser & Script Analyzer (core/cmd_parser.py, core/script_analyzer.py)

Rules Engine (core/rules.py)

Renderer (core/render.py)

Build Runner (core/build.py)

Reporter (core/report.py)

Security Module (core/security.py)

Configuration (core/config.py)

Store & RAG Index (core/store.py)

Dashboard Aggregator (core/dashboard.py)

Performance Harness (core/perf.py)

Local LLM Module (core/llm.py)

Data Flow Details

  1. Input ingestion – CLI/API accepts a path to a project directory; Dockerfile is read and parsed.
  2. Detection and analysis – Stack detection + smell labeling; CMD analyzer extracts scripts and risk signals.
  3. Rendering – Rules catalog chooses appropriate templates; renderer produces 1–2 candidates with metadata.
  4. Metrics estimation – Build runner estimates size/layer reductions and writes metrics.json.
  5. LLM assist (Complete only) – Candidates + metrics + analysis context are sent to LLMRankingService; ranking + rationale appended to metadata.
  6. Reporting – Reporter compiles Markdown and HTML with diffs, metrics, CMD rewrites, and LLM insights.
  7. Storagecore/store.py persists inputs, candidates, logs, metadata, reports, and updates RAG index.
  8. Dashboardscodi dashboard exports aggregated JSON referencing stored run artefacts.

Slim vs Complete Architecture

Aspect Slim Complete
Base image python:3.12-slim multi-stage Slim image + llama.cpp build stage
LLM runtime Disabled by default (LLM_ENABLED=false) Enabled with embedded server on port 8081
Environment defaults AIRGAP=true, CODI_RULESET_VERSION label set Same plus adapter metadata logging
Additional scripts N/A docker/runtime_complete.py, docker/scripts/mount_adapter.sh, docker/scripts/verify_adapter.py
Ports 8000 8000 (API) + 8081 (LLM)
Volume mounts /work /work + /models (weights/adapters)

Interfaces

CLI (cli/main.py)

API (api/server.py)

Artefact Layout

runs/<timestamp>-<stack>-<label>/
├── inputs/
│   └── Dockerfile
├── candidates/
│   ├── candidate_1.Dockerfile
│   └── candidate_2.Dockerfile
├── logs/
│   └── build.log (future BuildKit integration)
├── metadata/
│   ├── run.json
│   ├── metrics.json
│   ├── llm_metrics.json (Complete)
│   ├── rag.json
│   └── environment.json
└── reports/
    ├── report.md
    └── report.html

A shared _rag/index.sqlite3 database stores embeddings referenced by metadata/rag.json.

Extensibility Points

Observability Hooks

Security Considerations

Roadmap Hooks

The architecture intentionally separates deterministic rules from ML-driven insights so that future enhancements (additional stacks, BuildKit integration, policy packs, remote adapters) can be implemented without changing the fundamental pipeline. Refer to REFERENCE.md for formal schemas and to LLM_MODULE.md for model lifecycle details.