0

I am working on a csv file using Python pandas. The file consists of a list of countries, a series of dates and other metrics. Different countries have different latest value of dates (some at 18 Feb 2021 but some at 17 Feb 2021). Thus, I would like to know how I can select the row of each countries with the most updated date? For example, I would like to have US data at 02-13 and UK data at 02-16 below. I tried with the following codes but it returned a type error specifying an integer is required.

import pandas as pd

df = pd.DataFrame({"location":["US","US","US","UK","UK"],
                   "date":["02-11","02-12","02-13","02-15","02-16"],
                   "total_vaccinations":["100","200","300","400","500"]})

updated_date = df[["location","date"]].dropna().groupby("location").date.max()

newest_total = df.apply(lambda row: row["date"] == updated_date[row["location"]])

expected_df = pd.DataFrame({"location":["US","UK"],
                   "date":["02-13","02-16"],
                   "total_vaccinations":["300","500"]})
Bruce Chu
  • 1
  • 1
  • 1
    Welcome to Stackoverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Feb 18 '21 at 10:44
  • what does your dataframe look like? give example – Aven Desta Feb 18 '21 at 10:46
  • Thanks for the comments. I have given the example of the dataframe. – Bruce Chu Feb 18 '21 at 10:55
  • Super, can you add expected ouput? There are no missing values. – jezrael Feb 18 '21 at 10:56
  • Thanks jezrael, I have added the expected df in the question, not sure this is clear. – Bruce Chu Feb 18 '21 at 10:59

0 Answers0