Open source · TraceML for PyTorch
Find what's slowing
your training run,
while it's happening.
One context manager. Works in 3 minutes.
Dataloader stalling your GPU? You will see it flagged live, not after the run finishes.
DDP straggler slowing your ranks? TraceML shows which rank, how much gap, and whether it is the dataloader or compute.
Step time drifting? Visible before the run finishes so you can stop it, not wait for it to crash.
DATALOADER STALL · step 1,240
Step time — last 100 steps
Median 23.1msWorst 25.9ms
DL fetch
13.4ms
Forward
4.4ms
Backward
3.2ms
Optimizer
1.8ms
DDP — 4 ranks
Straggler 1.00× ✓
Memory
GPU mem 14.2 / 96 GB peak 17.1 GB ↑
Simple setup
No agents. No infrastructure. Pip install and one context manager, that's all .
1
pip install `traceml-ai`
No system dependencies. Works where PyTorch runs.
2
Wrap your training step
One context manager around your existing loop. No other changes to your script.
3
Run your script
Live terminal view opens alongside your logs. Compact summary at run end.
train.py
only change needed
from traceml.decorators import trace_step for batch in dataloader: with trace_step(model): outputs = model(batch["x"]) loss = criterion(outputs, batch["y"]) loss.backward() optimizer.step() optimizer.zero_grad(set_to_none=True)
$ traceml run train.py
Plain PyTorch
PyTorch training loops
Use
trace_step(model) around your step. Single GPU and single-node DDP.Hugging Face
HF Trainer
Replace
Trainer with TraceMLTrainer. One line change.Lightning
PyTorch Lightning
Add
TraceMLCallback() to your trainer callbacks.Try it or talk to us
TraceML is free and open source. If you are running regular training jobs and want a second set of eyes on what is actually slow, we are easy to reach.
We work with a small number of teams directly, looking at real run data. If that sounds useful, just email.