I am working on a project for university in which I received one dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame({'City': ['PARIS', 'MADRI', 'RIO', 'RIO',
'PARIS', 'RIO', 'LISBOA', 'RIO'],
'Year': [2018, 2015, 2020, 2020,
2017, 2021, 2022, 2022]})
print(df)
City Year
PARIS 2018
MADRI 2015
RIO 2020
RIO 2020
PARIS 2017
RIO 2021
LISBOA 2022
RIO 2022
I need to build a dataframe that displays how many times each city has appeared over the years. The desired output is as per this example dataframe:
df_out_intended = pd.DataFrame({'Year': [2015,2016,2017,2018,2019,2020,2021,2022],
'PARIS': [0,0,1,1,0,0,0,0],
'MADRI': [1,0,0,0,0,0,0,0],
'RIO': [0,0,0,0,0,2,1,1],
'LISBOA': [0,0,0,0,0,0,0,1]
})
print(df_out_intended)
Year PARIS MADRI RIO LISBOA
2015 0 1 0 0
2016 0 0 0 0
2017 1 0 0 0
2018 1 0 0 0
2019 0 0 0 0
2020 0 0 2 0
2021 0 0 1 0
2022 0 0 1 1
I thought about using groupby to do this task. Implementing as follows:
df_test = df.groupby(['City', 'Year']).sum()
But the output is not the desired one, the output is:
print(df_test)
City Year
LISBOA 2022
MADRI 2015
PARIS 2017
2018
RIO 2020
2021
2022
I also tried using value_counts() as follows:
df_test2 = df(['City', 'Year']).value_counts()
But the error message appears: "TypeError: 'DataFrame' object is not callable"