0

I have several dataframes with similar columns:

df1
name, age, state

df2
name, age, state

and so on.

How can I count frequency occurrences of age values from age column across all dataframes? Something like:

Age  Count
42   23
38   20
..   ..

Thank you!

ikechi
  • 329
  • 1
  • 9
  • 1
    concatenate , and than use value count for age columns – qaiser Aug 02 '21 at 11:07
  • 1
    Does this answer your question? [How to count occurrence of a value across multiple data frames](https://stackoverflow.com/questions/56870997/how-to-count-occurrence-of-a-value-across-multiple-data-frames) – Qudus Aug 02 '21 at 11:09
  • 1
    `pd.concat([df1, df1])["Age"].value_counts().reset_index(name="Count")` – Erfan Aug 02 '21 at 11:10
  • @Erfan Thank you! In case of random or large number of dataframes will looping through the dataframes work inside the concat? – ikechi Aug 02 '21 at 11:25
  • 1
    Yes, you can create a list beforehand, or with list comprehension inside the concat like so: `pd.concat([df for df in list_of_dfs if ...])` for example. – Erfan Aug 02 '21 at 11:30
  • @Qudus, thanks but link is about R, not python. – ikechi Aug 02 '21 at 11:52
  • @Erfan yes exactly what I'm trying to do but getting "TypeError: list indices must be integers or slices, not str". Dataframes are generated from csv files into dict and they have dynamically assigned names, like df[0], df[1] and so on. Therefore age column is df[0]["age"]. Putting this into pd.concat() produces TypeError. I'm clearly missing something, just can't wrap my head around it. – ikechi Aug 02 '21 at 12:26
  • 1
    @ikechi So, you have a list of dataframes named in the format `df[0], df[1], df[2], ...` ? If so, have you tried `pd.concat(df)` ? Is it returning the concatenated dataframes by row ? – SeaBean Aug 02 '21 at 14:08
  • @SeaBean Thanks, got it working exactly this way. – ikechi Aug 02 '21 at 16:42

1 Answers1

1

You can concat the dataframes, then use GroupBy.count() to get the counts by age, as follows:

pd.concat([df1, df2], ignore_index=True).groupby('age')['age'].count().reset_index(name='Count')
SeaBean
  • 22,547
  • 3
  • 13
  • 25