Application Model¶

claude-code-log reads Claude Code transcript files (JSONL on disk) and produces readable HTML, Markdown, and structured JSON views, with optional caching, a TUI for navigation, and per-project aggregate pages.

This document is the entry point for dev-docs/: a high-level view of the parts, what each does, and where to read about them in detail. For end-user documentation see the project README.md; for contributor onboarding see CONTRIBUTING.md; for user-facing operations docs see docs/.

1. Subsystems at a glance¶

Subsystem	Owner module(s)	Deep-dive
CLI	`cli.py`	inlined below (§ 2.1)
TUI	`tui.py`	inlined below (§ 2.2)
Cache (SQLite)	`cache.py` + `migrations/`	inlined below (§ 2.3); user-facing in `docs/restoring-archived-sessions.md`
Migrations	`migrations/` + `migrations/runner.py`	inlined below (§ 2.4)
Parsing	`parser.py`, `factories/`	rendering-architecture.md § 3
Message taxonomy	`models.py`	messages.md
DAG (sessions, forks, agents)	`dag.py`	dag.md
Sync sub-agents (#79)	`converter.py`, `factories/agent_metadata_factory.py`	agents.md § 1
Async task agents (#90)	`converter.py`, `factories/task_notification_factory.py`	agents.md § 2
Teammates (#91)	`renderer.py`, `factories/teammate_factory.py`, `html/teammate_formatter.py`	teammates.md
Rendering pipeline	`renderer.py`, `html/`, `markdown/`, `json/`	rendering-architecture.md
Fold-bar / message hierarchy	`html/templates/components/`, JS in `transcript.html`	message-hierarchy.md
CSS class taxonomy	`html/templates/components/*.css`	css-classes.md
JSON export (#36)	`json/`	inlined below (§ 2.5)
Detail-level filter	renderer.py § Detail-level filtering, `models.DetailLevel`	inlined below (§ 2.6)
Image export	`image_export.py`	inlined below (§ 2.7)
Performance profiling	`renderer_timings.py`	inlined below (§ 2.8)
Diagnosing hangs (SIGUSR1)	`cli.py` `_install_stack_dump_signal`	inlined below (§ 2.9)
Adding a new tool renderer	`factories/tool_factory.py`, `html/tool_formatters.py`	implementing-a-tool-renderer.md (how-to)
Plugin system (third-party message transformers)	`plugins.py`, `factories/priorities.py`, `Renderer._dispatch_format`	plugins.md

A note on cross-cutting concerns: some behaviour spans several rows of the table above and isn't owned by any single subsystem. Label and preview composition (session header titles, branch labels, fork-point box captions) is the most common one — it touches the DAG layer (which decides what's a branch), the renderer's session machinery (which assembles the label text), and the parsing layer (which feeds the preview source). See the SessionHeaderMessage entry in § 4 for the function-level surface.

2. Subsystems without their own deep-dive¶

The subsystems above with "inlined below" pointers don't have a dedicated dev-doc — the paragraph here is the canonical reference.

2.1 CLI¶

cli.py is the command-line entry point (claude-code-log) built on Click. The default invocation processes the entire ~/.claude/projects/ hierarchy; explicit paths target a single transcript or directory. Major flags:

--tui — launch the interactive TUI (§ 2.2).
--detail {full,high,low,minimal,user-only} — drop content from the rendered output (§ 2.6).
--from-date "yesterday", --to-date "today" — natural-language date filtering via dateparser.
--open-browser — open the generated index.html after rendering.
--no-cache / --update-cache — bypass or force-refresh the SQLite cache (§ 2.3).
--format {html,md,markdown,json} — switch output format (HTML is the default; Markdown is mainly used for sharing transcripts inline; JSON exports the processed tree for downstream tooling — see § 2.5).
--compact — Markdown-only; suppresses repeated headings.
--page-size N — paginate the combined-transcript HTML/Markdown output, packing whole sessions into pages of up to N messages each (sessions are never split across pages, so individual pages may overflow). Per-session HTML files are not paginated.

CLI orchestration delegates to converter.py (which owns the high-level "load + render + write" flow) and never touches renderer.py directly. Output paths follow a stable convention so the cache and re-renders can find existing files: combined_transcripts.html, session-{id}.html, index.html, with --detail and --compact adding suffixes per utils.variant_suffix.

2.2 TUI¶

tui.py is a Textual application that browses the projects index, drills into individual sessions, and exposes quick actions: render session to HTML, resume a session via claude --resume, archive a session (move to cache-only), and so on.

Architecture is straightforward Textual: a few Screen subclasses, a DataTable for the session list, key bindings dispatched through Textual's BINDINGS mechanism. The TUI reads through cache.py exclusively (never re-parses JSONL itself) — opening a 50-project hierarchy takes milliseconds because cache hydration is incremental.

The "archive" action is interesting: it moves a session's source JSONL out of ~/.claude/projects/ while keeping the cache row intact. The session then renders from cache only. See docs/restoring-archived-sessions.md for the user-facing behaviour and recovery flow.

2.3 Cache (SQLite)¶

cache.py maintains a SQLite database at ~/.claude/projects/claude-code-log-cache.db (or $CLAUDE_CODE_LOG_CACHE_PATH). Stored data:

Per-session: id, summary, first/last timestamps, message count, per-role token totals, team_name (added in migration 005).
Per-message: a denormalised view used by archived-session restoration (the cache holds enough to re-render even after the source JSONL is deleted).
Per-rendered-HTML: the HTML output itself, indexed by source file mtime + detail-level + compact flag (migrations 002–004) — so re-runs with unchanged inputs serve the cached HTML directly.

Invalidation is mtime-based: when a JSONL's mtime is newer than its cache row, the session is reparsed. The schema-version row also invalidates the entire HTML cache when migrations bump the version, since rendered output may have changed even when source data hasn't.

For the operations / recovery side (archived sessions, manual deletion, cleanupPeriodDays), see docs/restoring-archived-sessions.md.

2.4 Migrations¶

claude_code_log/migrations/ is a small migration system. Each migration is a NNN_description.sql file applied in numeric order by migrations/runner.py. The schema-version table tracks which migrations have run; cache.py invokes the runner on every connection open, so a fresh checkout running against an old cache DB transparently upgrades.

Current migrations:

001_initial_schema.sql — sessions table + per-message metadata.
002_html_cache.sql — adds the rendered-HTML cache layer.
003_html_pagination.sql / 004_html_pagination_variant.sql — per-page HTML chunks for --page-size.
005_session_team_name.sql — adds team_name to sessions for the teammates feature (PR #125).

Recreating-tables migrations toggle PRAGMA foreign_keys = OFF/ON around the rebuild to avoid losing rows to cascade-deletes during the swap.

2.5 JSON export¶

claude_code_log/json/ is a thin renderer that mirrors HtmlRenderer / MarkdownRenderer: same generate(...) / generate_session(...) / generate_projects_index(...) surface, same --detail and --compact honoring. Output is a structured JSON document — top-level version / title / detail / compact / sessions / messages keys; each node carries index / type / title / timestamp / session_id / content, plus optional parent_uuid / agent_id / pair_first etc. when present. Children are nested directly under their parent's children array — it's the same tree the HTML/Markdown renderers walk, serialized verbatim.

The renderer runs entries through generate_template_messages (the same format-neutral pipeline § 3 describes), so JSON output inherits all post-factory polishing for free: slash-command normalisation (bare <command-name>X</command-name> → /X), command-args hardening, teammate session-color enrichment, etc. There is no JSON-specific cleanup pass — the rule of thumb is: if it shows up right in HTML/Markdown, it shows up right in JSON. This is the operative example of the factory-layer normalisation seam: raw TranscriptEntry data is polished once at factory time into the typed MessageContent models that all three renderers share, so display polish lives in one place rather than being re-implemented per output format.

A few JSON-specific touches:

_json_default unwraps Pydantic models embedded in MessageContent dataclasses (tool inputs/outputs are Pydantic; dataclasses.asdict doesn't recurse into them, so without this hook they'd stringify via __repr__ and lose structure). Also handles Enum and Path.
is_outdated(file_path) reads the version field from existing JSON output and compares against the current library version — same invalidation contract as the HTML cache so re-runs skip unchanged outputs.
combined_transcripts.json per project; session-{id}.json for individual sessions. The naming respects variant_suffix for detail/compact variants.

The projects-index JSON (all-projects-summary.json) is a parallel top-level file — same shape as HTML's index.html but consumable by external tools (dashboards, query scripts, jq pipelines).

2.6 Detail-level filter¶

The --detail flag (and models.DetailLevel) lets users dial down how much of the transcript renders:

full (default) — everything.
high — detailed but cleaned: drops system/hook noise while keeping the full conversation and tool I/O.
low — drops most tool I/O, keeps the conversation plus a curated set of "interaction signal" tools (WebSearch, WebFetch, Task, Agent — the ones that show what the agent did, not what it read). See _LOW_KEEP_TOOLS in renderer.py.
minimal — drops all tool I/O.
user-only — drops everything except user messages and steering (designed for feeding to downstream agents, e.g. building a requirements doc).

Filtering happens in two passes: a pre-render pass on TranscriptEntry that strips content items (e.g., tool_use blocks from assistant turns), and a post-render pass on TemplateMessage that drops whole content types created by factories (BashInputMessage, BashOutputMessage, CommandOutputMessage at low/minimal). The two-pass shape exists because some content is identifiable only after factory dispatch (e.g., distinguishing BashInputMessage from the tool_use that produced it).

Important interaction: _pair_skill_tool_uses runs before _filter_template_by_detail, and each pass that drops messages calls _reindex_filtered_context to remap surviving indices (the skill-fold pass remaps after dropping the slash-command body and the redundant "Launching skill" tool_result; the detail filter remaps after dropping content types below FULL). The reindex pass also has to update cached parent-message references on SessionHeaderMessage (see PR

131 fix). See rendering-architecture.md § 5 ¶

for the full pass order.

2.7 Image export¶

image_export.py is format-agnostic: HTML and Markdown both call into it. Three modes (matching the --image-export-mode CLI choices):

placeholder — drop the image and render a placeholder marker in its place.
embedded — base64-encode the image directly into the output as a data URL.
referenced — write the image to disk next to the output and embed a src= reference.

Default is embedded for HTML (single self-contained file) and referenced for Markdown (keeps the .md text small and lets images live as separate PNGs alongside).

2.8 Performance profiling¶

renderer_timings.py provides log_timing(label, t_start) context managers used throughout renderer.py. Set CLAUDE_CODE_LOG_DEBUG_TIMING=1 to print per-phase times to stderr — useful for spotting which phase regressed when a large transcript suddenly takes seconds longer than before.

2.9 Diagnosing hangs (SIGUSR1 stack dump)¶

When claude-code-log appears stuck (100% CPU, no output), a single SIGUSR1 to the running process dumps the live Python stack of every thread to stderr without killing it:

# In another terminal
kill -USR1 $(pgrep -f claude-code-log | head -1)

The handler is wired in cli.py::_install_stack_dump_signal() via faulthandler.register(SIGUSR1, all_threads=True, chain=False) and installed before any heavy work in the entry point. POSIX-only — Windows lacks SIGUSR1, the install is a silent no-op there. Unlike py-spy, this needs no root and no extra install, since the runtime is already wired to dump itself on demand. Added by PR #135 to make the DAG cyclic-children class of bug diagnosable in the field; useful for any future hang.

3. Data lifecycle¶

                 ┌──────────────────┐
                 │  JSONL file(s)   │
                 │ (~/.claude/...)  │
                 └────────┬─────────┘
                          │
                  parser.py + factories/
                          │
                          ▼
              ┌───────────────────────┐
              │ list[TranscriptEntry] │  (typed Pydantic models)
              └───────────┬───────────┘
                          │
                  factories/ dispatch
                          │
                          ▼
            ┌─────────────────────────┐
            │ list[TemplateMessage]   │  (each carrying a typed
            │  with MessageContent    │   MessageContent variant)
            └─────────────┬───────────┘
                          │
              renderer.py (generate_template_messages):
                build DAG → pair → reorder → relocate
                subagent blocks → build hierarchy →
                cleanup sidechain dups → populate caches
                          │
                          ▼
               ┌──────────────────────┐
               │ Tree of TemplateMsg  │
               │  + RenderingContext  │  (caches: teammate_colors,
               │  + nav data          │   task_subjects, etc.)
               └──────────┬───────────┘
                          │
      ┌────────────┬─────────────┴─────────────┬────────────┐
      ▼            ▼                           ▼            ▼
html/renderer.py   markdown/renderer.py    json/renderer.py
      │                  │                      │
      ▼                  ▼                      ▼
 index.html +        *.md                   combined_transcripts.json
 session-*.html      (single file)          session-*.json
                                            all-projects-summary.json
      │                  │                      │
      └──────────────────┼──────────────────────┘
                         │
              ┌──────────┴────────────┐
              ▼                       ▼
          cache.py              image_export.py
          (SQLite)              (HTML / Markdown only —
                                 JSON serialises paths)

Cache reads/writes happen in parallel with the main pipeline: cache.py is consulted before parsing (cache hit → skip parse), after rendering (write the rendered HTML), and during TUI navigation (the TUI never re-parses).

4. Cross-cutting glossary¶

Terms that appear across multiple subsystems — defined once here.

TranscriptEntry: typed Pydantic model for a single line in the source JSONL. Variants: User, Assistant, Summary, System, Passthrough, QueueOperation. See parser.py and models.py.
MessageContent: render-time content variant produced by the factories from TranscriptEntry. Many flavours (UserTextMessage, ToolUseMessage, TeammateMessage, …). One TranscriptEntry may yield multiple MessageContents (a single assistant turn with N tool_uses produces N+1 messages). See messages.md for the full taxonomy.
TemplateMessage: the render-time wrapper around a MessageContent. Carries message_index, parent/child links, pair_first/pair_middle/pair_last, ancestry, and the renderer-format CSS classes. Defined in renderer.py.
RenderingContext: mutable cache attached to one render pass. Holds the message registry plus nested per-session caches (teammate_colors, task_subjects, task_id_for_tool_use, session_first_message, etc.). Caches are session-scoped because combined-transcripts mode merges multiple sessions and per-session identifiers (teammate_id, task_id) aren't globally unique.
session_id: the JSONL's sessionId field. Often a UUID string. In some renderer paths a synthetic form is used:
{trunk}#agent-{agentId} for sub-agent transcripts (so they form a separate DAG-line attached to their spawning trunk).
{trunk}@{first_uuid_prefix} for branch sessions (rewinds / parallel-tool_use forks). See dag.md.
render_session_id: the session id that should be used when walking ctx.messages to find content for rendering, accounting for synthetic rewrites.
sidechain: a sub-agent's transcript entries are flagged isSidechain: true. The DAG layer integrates them into the parent session's tree under the spawning Task/Agent tool_use anchor. See agents.md, dag.md.
agent_id: identifier copied from a Task/Agent tool_result (either toolUseResult.agentId or parsed from the Markdown metadata tail). Used to stitch sub-agent JSONL files into the trunk DAG. See agents.md.
fork point / branch: when a session has multiple children with the same parent, the parent is the fork point and each child initiates a branch. Real forks come from /exit rewinds; spurious forks (parallel tool_uses, structural-only siblings) are collapsed by _walk_session_with_forks. See dag.md.
SessionHeaderMessage: the synthetic content type produced for every session boundary in the rendered output — the header that appears above each session's first real message. Two flavours: trunk headers for top-level sessions, and branch headers for fork branches (the "branch heading" you'll see referenced in bug reports). Both headers are constructed by _build_trunk_header / _build_branch_header (in renderer.py); the branch header's title is composed by _branch_label in the shape Branch • <uuid8> • <preview>, with the preview computed once by scanning the branch's DAG-line uuids for the first user entry with text (via extract_text_content in parser.py + create_session_preview in utils.py, which calls simplify_command_tags to strip raw <command-name> XML soup down to /cmd). When troubleshooting branch-heading rendering, those are the functions to inspect.
pair_first / pair_middle / pair_last: a pair of messages rendered as one logical unit (tool_use + tool_result, Slash + UserSlash, thinking + assistant). pair_middle exists for triples — currently the slash-command (UserSlash → Slash → CommandOutput) shape.
detail level: see § 2.6.
detail-aware tools: the curated set of tools whose I/O survives --detail low because they convey what the agent did, not what it read (WebSearch, WebFetch, Task, Agent).
passthrough: a PassthroughTranscriptEntry is a non-conversation entry (hook callbacks, progress updates, last-prompt markers). The DAG layer keeps them in the structure but the renderer typically hides them.

5. Where to start reading¶

Common entry questions and their best first stop:

"How does a JSONL line become an HTML row?" → rendering-architecture.md.
"Why are forks rendered weirdly / what is a branch session?" → dag.md.
"What message types exist and what do they look like?" → messages.md plus the samples in messages/.
"I want to add support for a new Claude Code tool." → implementing-a-tool-renderer.md.
"I want to write a third-party plugin (e.g. for an MCP tool we don't ship)." → plugins.md.
"How does folding / collapsible content work?" → message-hierarchy.md.
"What CSS classes does a message div get?" → css-classes.md.
"How are sub-agent transcripts (sync, async, teammates) integrated?" → agents.md, then teammates.md for the teammates-specific machinery.
"I want to extend the cache / change the schema." → § 2.3, § 2.4 here, then read the migration files in order.
"How do I export to JSON for downstream tooling?" → § 2.5 here (and --format json from § 2.1).
"claude-code-log is hung — how do I see what it's doing?" → § 2.9 (SIGUSR1 stack dump).
"What's planned but not implemented?" → work/ — each .md is an in-flight or proposed plan.