0

I have a pandas series that was created by the groupy method, this is how it looks:

release_year  genres  
1960          Drama       13
              Action       8
              Comedy       8
              Horror       7
              Romance      6
                          ..
2015          Music       33
              TV Movie    20
              History     15
              War          9
              Western      6

what I want to do is create a data frame that has the "release_years" as indexes, the "genres" as column labels and the value counts as the rows

Guy
  • 46,488
  • 10
  • 44
  • 88
  • 1
    [Groupby value counts on the dataframe pandas](https://stackoverflow.com/questions/39132742/groupby-value-counts-on-the-dataframe-pandas)? `df.groupby(['release_year', 'genres']).count().unstack(fill_value=0)` or `pd.crosstab(df['release_year'], df['genres'])`? – Henry Ecker Nov 07 '21 at 06:10
  • This gave me columns around 240 columns(instead of just 20,the number of `genres`), I found another solution using `value_counts` : `df.groupby("release_year")["genres"].value_counts().unstack(fill_value=0)` – Ahmed Khalifa Nov 07 '21 at 06:25
  • Given that neither of these approaches can create new column values (only restructure existing ones). Is it possible you have inconsistent naming/whitespaces in your column? – Henry Ecker Nov 07 '21 at 06:27
  • The original dataframe was a `(26000,14)` dataframe ,I used `groupby` to create a series grouped by `release_year` to count the `genres` values (The series in my question),when I used `df.groupby(['release_year', 'genres']).count().unstack(fill_value=0)` it repeated the columns 14 times (number of columns in the original dataframe) for each attribute from the original dataframe – Ahmed Khalifa Nov 07 '21 at 06:34
  • Ah. Then just `df.groupby(['release_year', 'genres'])['genres'].count().unstack(fill_value=0)` You didn't show you had 14 columns in your original dataset. – Henry Ecker Nov 07 '21 at 06:42
  • Though really `pd.crosstab(df['release_year'], df['genres'])` is best overall. Or value_counts with subset `df.value_counts(subset=['release_year', 'genres']).unstack(fill_value=0)` – Henry Ecker Nov 07 '21 at 06:43
  • Thanks, they all work perfectly! One last problem is that the dataframe labels look like this `genres Action Adventure Animation Comedy Crime Documentary Drama Family Fantasy Foreign History Horror Music Mystery Romance Science Fiction TV Movie Thriller War Western release_year ` ,I don't want the `genres` index above – Ahmed Khalifa Nov 07 '21 at 06:55
  • [rename_axis](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename_axis.html) Add `.rename_axis(columns=None)` to any of them. _e.g._ `pd.crosstab(df['release_year'], df['genres']).rename_axis(columns=None)` – Henry Ecker Nov 07 '21 at 06:59
  • Can also add [reset_index](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html) if we wanted to get `release_year` as a column instead of index: `pd.crosstab(df['release_year'], df['genres']).rename_axis(columns=None).reset_index()` – Henry Ecker Nov 07 '21 at 07:00
  • Works perfectly! Thanks Alot! – Ahmed Khalifa Nov 07 '21 at 07:06

0 Answers0