Low GPU utilization guide
Find what is making the GPU wait.
Low GPU utilization is a symptom, not a diagnosis. Use TraceML to separate input loading, H2D transfer, compute, wait time, memory behavior, and rank skew before changing model code.
Low GPU utilization can come from different phases.
Start with the System GPU utilization signal, then use Step Time to decide where to look. The label alone is not enough.
DataLoader stalls look like low GPU utilization.
In the DataLoader bottleneck demo, model compute stayed roughly the same. Input loading grew from 1.9ms to 531.8ms, and GPU utilization dropped from 67% to 7%.
Step time with 1.9ms input loading and 67% GPU utilization.
Step time with 531.8ms input loading and 7% GPU utilization.
Follow the specific signal.
Low utilization tells you the GPU is underused. TraceML's summary tells you which guide or tool to open next.
Find where GPU time is lost.
Install TraceML, wrap your training step, and run your script normally. Use the summary to decide whether low GPU utilization is an input, transfer, wait, memory, rank, or compute problem.