Open source · TraceML for PyTorch

Find what's slowing
your training run,
while it's happening.

One context manager. Works in 3 minutes.

Dataloader stalling your GPU? You will see it flagged live, not after the run finishes.
🔀
DDP straggler slowing your ranks? TraceML shows which rank, how much gap, and whether it is the dataloader or compute.
📈
Step time drifting? Visible before the run finishes so you can stop it, not wait for it to crash.
DATALOADER STALL · step 1,240
Step time — last 100 steps
Median 23.1msWorst 25.9ms

DL fetch
13.4ms
Forward
4.4ms
Backward
3.2ms
Optimizer
1.8ms

DDP — 4 ranks
Straggler 1.00× ✓

Memory
GPU mem 14.2 / 96 GB peak 17.1 GB ↑

Simple setup

No agents. No infrastructure. Pip install and one context manager, that's all .

1
pip install `traceml-ai`
No system dependencies. Works where PyTorch runs.
2
Wrap your training step
One context manager around your existing loop. No other changes to your script.
3
Run your script
Live terminal view opens alongside your logs. Compact summary at run end.
train.py only change needed
from traceml.decorators import trace_step

for batch in dataloader:
    with trace_step(model):
        outputs = model(batch["x"])
        loss = criterion(outputs, batch["y"])
        loss.backward()
        optimizer.step()
        optimizer.zero_grad(set_to_none=True)
$ traceml run train.py
Plain PyTorch
PyTorch training loops
Use trace_step(model) around your step. Single GPU and single-node DDP.
Hugging Face
HF Trainer
Replace Trainer with TraceMLTrainer. One line change.
Lightning
PyTorch Lightning
Add TraceMLCallback() to your trainer callbacks.

Try it or talk to us

TraceML is free and open source. If you are running regular training jobs and want a second set of eyes on what is actually slow, we are easy to reach.

We work with a small number of teams directly, looking at real run data. If that sounds useful, just email.