======================= Overall F1: 74.6 Yes/No Accuracy : 18.2 Followup Accuracy : 57.6 Unfiltered F1 (7354 questions): 71.2 Accuracy On Unanswerable Questions: 77.8 %% (1486 questions) Human F1: 80.8 Model F1 >= Human F1 (Questions): 4704 / 6573, 71.6% Model F1 >= Human F1 (Dialogs): 143 / 1000, 14.3% =======================