How to count the number of times a city appears in the dataframe per year?

Question

I am working on a project for university in which I received one dataframe:

      import numpy as np
      import pandas as pd

      df = pd.DataFrame({'City': ['PARIS', 'MADRI', 'RIO', 'RIO',
                                  'PARIS', 'RIO', 'LISBOA', 'RIO'],
                        'Year': [2018, 2015, 2020, 2020, 
                                 2017, 2021, 2022, 2022]})

      print(df)

           City   Year
          PARIS   2018
          MADRI   2015
           RIO    2020
           RIO    2020
          PARIS   2017
           RIO    2021
         LISBOA   2022
           RIO    2022

I need to build a dataframe that displays how many times each city has appeared over the years. The desired output is as per this example dataframe:

      df_out_intended = pd.DataFrame({'Year': [2015,2016,2017,2018,2019,2020,2021,2022],
                                      'PARIS': [0,0,1,1,0,0,0,0],
                                      'MADRI': [1,0,0,0,0,0,0,0],
                                      'RIO': [0,0,0,0,0,2,1,1],
                                      'LISBOA': [0,0,0,0,0,0,0,1]                                                         
                                      })

       print(df_out_intended)

           Year   PARIS  MADRI  RIO LISBOA
           2015     0      1     0    0
           2016     0      0     0    0
           2017     1      0     0    0
           2018     1      0     0    0
           2019     0      0     0    0
           2020     0      0     2    0
           2021     0      0     1    0
           2022     0      0     1    1

I thought about using groupby to do this task. Implementing as follows:

      df_test = df.groupby(['City', 'Year']).sum()

But the output is not the desired one, the output is:

      print(df_test)

       City     Year
      LISBOA    2022
      MADRI     2015
      PARIS     2017
                2018
      RIO       2020
                2021
                2022

I also tried using value_counts() as follows:

      df_test2 = df(['City', 'Year']).value_counts()

But the error message appears: "TypeError: 'DataFrame' object is not callable"

score 1 · Accepted Answer · answered Apr 27 '22 at 12:27

1

Use crosstab

pd.crosstab(df.Year,df.City)

answered Apr 27 '22 at 12:27

wwnde

26,119
6
18
32

1

Worked perfectly! To adjust the columns, I would suggest running in the sequence: df_result = df_result.reset_index() – valentim.kodak Apr 27 '22 at 12:46

How to count the number of times a city appears in the dataframe per year?

1 Answers1