#942025-04-12
Phantom Clipping: Why Your RLHF Run Stalls When Trainer Is FP32 and vLLM Is BF16
Hugging Face's TRL team finally pinpointed a long-suspected RLHF failure mode. It is not noise. It is PPO's clip silently zeroing out 18% of tokens because the trainer and the inference engine disagree at the bit level.