# `PtcRunner.TraceLog`
[🔗](https://github.com/andreasronge/ptc_runner/blob/main/lib/ptc_runner/trace_log.ex#L1)

Captures SubAgent execution events to JSONL files for offline analysis.

TraceLog attaches to SubAgent telemetry events and writes them to a JSONL file,
enabling detailed debugging and performance analysis of agent executions.

## Usage

The simplest way to capture a trace is with `with_trace/2`:

    {:ok, {:ok, step}, trace_path} = TraceLog.with_trace(fn ->
      SubAgent.run(agent, llm: my_llm())
    end)

    # Analyze the trace
    events = TraceLog.Analyzer.load(trace_path)
    summary = TraceLog.Analyzer.summary(events)

For more control, use `start/1` and `stop/1`:

    {:ok, collector} = TraceLog.start(path: "my_trace.jsonl")
    {:ok, step} = SubAgent.run(agent, llm: my_llm())
    {:ok, path, errors} = TraceLog.stop(collector)

## Event Format

Each line in the JSONL file is a JSON object with:

    {
      "event": "run.start",           # Event type (run|turn|llm|tool).(start|stop|exception)
      "trace_id": "abc123...",        # Unique trace identifier
      "timestamp": "2024-01-...",     # ISO 8601 timestamp
      "measurements": {...},          # Telemetry measurements
      "metadata": {...},              # Event-specific metadata
      "duration_ms": 123              # Duration (for stop events)
    }

## Process Isolation and Cross-Process Propagation

Traces are isolated by process. Only events from the process that called `start/1`
are captured. This allows multiple concurrent traces without interference.

Nested traces are supported - each `with_trace` call creates its own trace file,
and events are routed to the innermost active collector.

### Cross-Process Tracing

When execution spans multiple processes (e.g., parallel tasks), use
`join/2` to propagate trace context to child processes:

    collectors = TraceLog.active_collectors()
    parent_span = PtcRunner.SubAgent.Telemetry.current_span_id()

    Task.async(fn ->
      TraceLog.join(collectors, parent_span)
      # Events from this process are now captured AND linked to parent
    end)

**Note:** The sandbox process inherits trace collectors via `join/2`, so tool
telemetry events (`tool.start`, `tool.stop`) emitted inside the sandbox are
captured directly by the trace handler.

## See Also

- `PtcRunner.TraceLog.Analyzer` - Load and analyze trace files
- `PtcRunner.TraceLog.Collector` - Low-level file writing
- `PtcRunner.TraceLog.Handler` - Telemetry handler
- [Observability Guide](subagent-observability.md) - How TraceLog relates to `PtcRunner.Tracer`

# `active_collectors`

```elixir
@spec active_collectors() :: [pid()]
```

Returns all active collectors for the current process.

The list is ordered from innermost (most recent) to outermost.

# `active_memory_sinks`

```elixir
@spec active_memory_sinks() :: [pid()]
```

Returns all in-memory sinks active in the current process (innermost first).

# `current_collector`

```elixir
@spec current_collector() :: pid() | nil
```

Returns the collector for the current process, if any.

## Examples

    {:ok, _collector} = TraceLog.start()
    collector = TraceLog.current_collector()
    # collector is a pid

# `join`

```elixir
@spec join([pid()], String.t() | nil) :: :ok
```

Joins the current process to existing trace collectors.

This is used for trace propagation across process boundaries. When spawning
child processes (via Task.async_stream, Process.spawn, etc.), the parent's
trace collectors are not automatically inherited. Call this function at the
start of the child process to re-attach to the parent's trace session.

## Parameters

  * `collectors` - List of collector PIDs to join (from `active_collectors/0`)
  * `parent_span_id` - Optional span ID from parent process for span hierarchy

## Example

    # In parent process
    collectors = TraceLog.active_collectors()
    parent_span = PtcRunner.SubAgent.Telemetry.current_span_id()

    Task.async(fn ->
      TraceLog.join(collectors, parent_span)
      # Now trace events from this process will be captured
      # AND linked to the parent span hierarchy
      SubAgent.run(agent, llm: llm)
    end)

## Notes

- Only joins collectors that are still alive (stale PIDs are filtered out)
- Does not attach telemetry handlers (they are global and already attached)
- Safe to call multiple times or with an empty list
- When `parent_span_id` is provided, sets up span hierarchy so new spans
  in this process have the parent span as their parent_span_id

# `record_turn_event`

```elixir
@spec record_turn_event(map()) :: :ok
```

Records an already-built turn-event map to every active sink — *every* JSONL
collector on the stack and every active in-memory sink.

Unlike `write_to_active/1` (innermost collector only), this fans out to all
active collectors, matching the telemetry handler's routing: under nested
`with_trace/2` scopes both the inner and outer trace files capture the turn,
so cross-session analysis works for either. Each collector stamps its own
`trace_id`/`seq` on its copy of the event.

This is the single emission point shared by both turn drivers
(`PtcRunner.Session` and the `PtcRunner.SubAgent` loop). It never raises and
is a no-op when nothing is recording. Build the event with
`PtcRunner.TraceLog.TurnEvent.build/1`.

# `recording?`

```elixir
@spec recording?() :: boolean()
```

Returns true if any turn-log sink (JSONL collector or in-memory sink) is
active in the current process.

Turn drivers consult this before building a turn event so the (small) build
cost is skipped entirely when nothing is recording.

# `start`

```elixir
@spec start(keyword()) :: {:ok, pid()}
```

Starts trace collection for the current process.

Returns a collector process that will capture all SubAgent telemetry events
from this process until `stop/1` is called.

## Options

  * `:path` - File path for the JSONL output. Defaults to a timestamped file.
  * `:trace_id` - Custom trace identifier. Defaults to a random hex string.
  * `:trace_kind` - Trace type discriminator (e.g., `"benchmark"`, `"analysis"`).
  * `:producer` - Component that created this trace (e.g., `"demo.benchmark"`).
  * `:trace_label` - Human-readable label (e.g., test case name).
  * `:model` - LLM model identifier.
  * `:query` - Input query or question.
  * `:meta` - Producer-specific metadata under `data`.

## Examples

    {:ok, collector} = TraceLog.start()
    {:ok, step} = SubAgent.run(agent, llm: my_llm())
    {:ok, path, errors} = TraceLog.stop(collector)

    # With typed trace header
    {:ok, collector} = TraceLog.start(
      path: "/tmp/debug.jsonl",
      trace_kind: "benchmark",
      producer: "my_app",
      query: "How many products?"
    )

# `start_memory_sink`

```elixir
@spec start_memory_sink(keyword()) :: {:ok, pid()}
```

Starts an in-memory turn-log sink and activates it for the current process.

Returns the sink pid; query it with `PtcRunner.TraceLog.MemorySink.events/1`
(or the cross-session `PtcRunner.TraceLog.Analyzer` functions). Accepts the
same options as `PtcRunner.TraceLog.MemorySink.start_link/1` (`:max_bytes`,
`:name`). Detach with `stop_memory_sink/1`.

# `stop`

```elixir
@spec stop(pid()) :: {:ok, String.t(), non_neg_integer()}
```

Stops trace collection and closes the trace file.

Returns the path to the trace file and the number of write errors (if any).

## Examples

    {:ok, collector} = TraceLog.start()
    # ... run SubAgent ...
    {:ok, path, errors} = TraceLog.stop(collector)

# `stop_memory_sink`

```elixir
@spec stop_memory_sink(pid()) :: :ok
```

Detaches an in-memory sink from the current process so new turn events no
longer route to it. The sink process is left alive so its events stay
queryable; stop it explicitly when done.

# `with_trace`

```elixir
@spec with_trace(
  (-&gt; result),
  keyword()
) :: {:ok, result, String.t()}
when result: term()
```

Executes a function while capturing a trace.

This is the recommended way to capture traces. It ensures the trace is
properly started and stopped, even if the function raises an exception.

## Options

Accepts all options from `start/1`: `:path`, `:trace_id`, `:trace_kind`,
`:producer`, `:trace_label`, `:model`, `:query`, `:meta`.

## Examples

    {:ok, {:ok, step}, trace_path} = TraceLog.with_trace(fn ->
      SubAgent.run(agent, llm: my_llm())
    end)

    # With typed trace header
    {:ok, {:ok, step}, path} = TraceLog.with_trace(
      fn -> SubAgent.run(agent, llm: my_llm()) end,
      trace_kind: "benchmark",
      query: "How many products?"
    )

# `write_to_active`

```elixir
@spec write_to_active(map()) :: :ok | :no_collector
```

Writes a serialized event map to the innermost active collector in the
calling process's collector stack.

Used by call sites that need to emit a custom JSONL line from outside
the SubAgent telemetry path (e.g., the MCP server recording per-call
outcomes). The event map is forwarded as-is to
`PtcRunner.TraceLog.Collector.write_event/2`, which assigns a `seq`,
encodes via `PtcRunner.TraceLog.Event.encode/1`, and appends a JSONL
line.

Returns `:ok` if a collector was active and the event was queued for
writing, or `:no_collector` if no `with_trace/2` / `start/1` scope is
active in this process.

This function never raises. Errors during forwarding are swallowed
(writes are async casts; encoding failures degrade gracefully).

## Examples

    TraceLog.with_trace(fn ->
      TraceLog.write_to_active(%{
        "event" => "mcp.call.stop",
        "trace_id" => "abc",
        "timestamp" => DateTime.utc_now() |> DateTime.to_iso8601(),
        "data" => %{"reason" => "ok"}
      })
    end)

---

*Consult [api-reference.md](api-reference.md) for complete listing*