Temperature와 관련한 부분은 구현하지 않았습니다. "The learnable temperature parameter was clipped to prevent scaling the logits by more than 100 which we found necessary to prevent training instability." ...