How to create a pandas dataframe from a pandas groupby series?

Question

I have a pandas series that was created by the groupy method, this is how it looks:

release_year  genres  
1960          Drama       13
              Action       8
              Comedy       8
              Horror       7
              Romance      6
                          ..
2015          Music       33
              TV Movie    20
              History     15
              War          9
              Western      6

what I want to do is create a data frame that has the "release_years" as indexes, the "genres" as column labels and the value counts as the rows

[Groupby value counts on the dataframe pandas](https://stackoverflow.com/questions/39132742/groupby-value-counts-on-the-dataframe-pandas)? `df.groupby(['release_year', 'genres']).count().unstack(fill_value=0)` or `pd.crosstab(df['release_year'], df['genres'])`? — Henry Ecker, Nov 07 '21 at 06:10
This gave me columns around 240 columns(instead of just 20,the number of `genres`), I found another solution using `value_counts` : `df.groupby("release_year")["genres"].value_counts().unstack(fill_value=0)` — Ahmed Khalifa, Nov 07 '21 at 06:25
Given that neither of these approaches can create new column values (only restructure existing ones). Is it possible you have inconsistent naming/whitespaces in your column? — Henry Ecker, Nov 07 '21 at 06:27
The original dataframe was a `(26000,14)` dataframe ,I used `groupby` to create a series grouped by `release_year` to count the `genres` values (The series in my question),when I used `df.groupby(['release_year', 'genres']).count().unstack(fill_value=0)` it repeated the columns 14 times (number of columns in the original dataframe) for each attribute from the original dataframe — Ahmed Khalifa, Nov 07 '21 at 06:34
Ah. Then just `df.groupby(['release_year', 'genres'])['genres'].count().unstack(fill_value=0)` You didn't show you had 14 columns in your original dataset. — Henry Ecker, Nov 07 '21 at 06:42
Though really `pd.crosstab(df['release_year'], df['genres'])` is best overall. Or value_counts with subset `df.value_counts(subset=['release_year', 'genres']).unstack(fill_value=0)` — Henry Ecker, Nov 07 '21 at 06:43
Thanks, they all work perfectly! One last problem is that the dataframe labels look like this `genres Action Adventure Animation Comedy Crime Documentary Drama Family Fantasy Foreign History Horror Music Mystery Romance Science Fiction TV Movie Thriller War Western release_year ` ,I don't want the `genres` index above — Ahmed Khalifa, Nov 07 '21 at 06:55
[rename_axis](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename_axis.html) Add `.rename_axis(columns=None)` to any of them. _e.g._ `pd.crosstab(df['release_year'], df['genres']).rename_axis(columns=None)` — Henry Ecker, Nov 07 '21 at 06:59
Can also add [reset_index](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html) if we wanted to get `release_year` as a column instead of index: `pd.crosstab(df['release_year'], df['genres']).rename_axis(columns=None).reset_index()` — Henry Ecker, Nov 07 '21 at 07:00

How to create a pandas dataframe from a pandas groupby series?

0 Answers0