I am working on a csv file using Python pandas. The file consists of a list of countries, a series of dates and other metrics. Different countries have different latest value of dates (some at 18 Feb 2021 but some at 17 Feb 2021). Thus, I would like to know how I can select the row of each countries with the most updated date? For example, I would like to have US data at 02-13 and UK data at 02-16 below. I tried with the following codes but it returned a type error specifying an integer is required.
import pandas as pd
df = pd.DataFrame({"location":["US","US","US","UK","UK"],
"date":["02-11","02-12","02-13","02-15","02-16"],
"total_vaccinations":["100","200","300","400","500"]})
updated_date = df[["location","date"]].dropna().groupby("location").date.max()
newest_total = df.apply(lambda row: row["date"] == updated_date[row["location"]])
expected_df = pd.DataFrame({"location":["US","UK"],
"date":["02-13","02-16"],
"total_vaccinations":["300","500"]})