-1

I have a pandas dataframe that looks like this

genre1    genre2    genre3   Votes1  votes2  votes3 ......… cnt
Comedy    Animation Drama    8.3     7.0     8.5            1
Adventure Comedy    Mystery  6.4     8.2     3.5            1
Drama     Music     Sci-Fi   3.8     6.2     5.9            1
.
.
.

I want to create 3 new data frames using group by of individual genres and sum of all the other numerical columns seperately for each dataframe. I have tried different variations of groupby, sum of pandas but I am unable to figure out how to apply groupby sum all together to give the result as shown. Please share any ideas that you might have. Thanks!

anky
  • 74,114
  • 11
  • 41
  • 70
priya
  • 1
  • 1
  • 1
    Please provide a small set of sample data as text that we can copy and paste. Include the corresponding desired result. Check out the guide on [how to make good reproducible pandas examples](https://stackoverflow.com/a/20159305/3620003). – timgeb Jun 15 '20 at 14:09

1 Answers1

0

When you do df.groupby().sum() you will get a DataFrame with a column for each column summed over, and the index will be the different groups.

Additionally, you can pass a list of columns names to groupby(). So you could do: df.groupby(["genre1", "genre2", "genre3"])

Examples:

>>> df = pd.DataFrame(
    {
        "hello": ["world", "brave", "world", "brave",], 
        "num1": [1, 2, 3, 4], 
        "num2": [1, 2, 3, 4]
    }
)
>>> df
   hello  num1  num2
0  world     1     1
1  brave     2     2
2  world     3     3
3  brave     4     4
>>> df.groupby("hello").sum()
       num1  num2
hello
brave     6     6
world     4     4
>>> df.groupby("hello").sum().columns
Index(['num1', 'num2'], dtype='object')
>>> df.groupby("hello").sum().index
Index(['brave', 'world'], dtype='object', name='hello')
>>> df = pd.DataFrame(
    {
        "hello1": ["world", "brave", "world", "brave",], 
        "hello2": ["new", "world", "brave", "new",], 
        "num1": [1, 2, 3, 4], 
        "num2": [1, 2, 3, 4]
    }
)
>>> df.groupby(["hello1", "hello2"]).sum()
               num1  num2
hello1 hello2
brave  new        4     4
       world      2     2
world  brave      3     3
       new        1     1

That should give you the result you are looking for, but if you want multiple DataFrames, you may have to copy the data from the output DataFrame into new DataFrames for each column that you want in its own DataFrame.

Evin O'Shea
  • 61
  • 1
  • 3