I have the following codes to calculate the average of outputs in DataFrame with the data from a XLSX file. The calculate_score()
will return a float
score, e.g. 5.12.
import pandas as pd
testset = pd.read_excel(xlsx_filename_here)
total_score = 0
num_records = 0
for index, row in testset.iterrows():
if row['Data1'].isna() or row['Data2'].isna() or row['Data3'].isna():
continue
else:
score = calculate_score([row['Data1'], row['Data2']], row['Data3'])
total_score += score
num_records += 1
print("Average score:", round(total_score/num_records, 2))
According to this answer, df.iterrows()
is slow and anti-pattern. How can I change the above codes to use either Vectorization or List Comprehension?
UPDATE
I over-simplify the calculate_score()
in the example above, it is actually calculating the BLEU score of some sentences using SacreBLEU library:
import evaluate
sacrebleu = evaluate.load("sacrebleu")
def calculate_score(ref, translation):
return sacrebleu.compute(predictions=[translation], references=[ref])
Note the original codes updated slightly as well. How can I modify the calculate_score()
to use list comprehension? Thanks.