0

i have a dataframe that looks like that:

import pandas as pd


df = pd.DataFrame({'Name' : ['A', 'A', 'B','C','C','C','D','D'],
                   'ID' : ['1', '1', '2','3','3','4','4','4'],
                   'duration' : ['600', '3000', '3000', '600', '3000', '3000', '600','3000']})

and i would like to get something like this:

Name ID 600 3000
 A    1  1   1
 B    2  0   1
 C    3  1   1
 C    4  0   1
 D    4  1   1

i tried to use groupby put there seems to be missing one step

Taiko
  • 51
  • 7
  • i try to group the data based on Name and ID and then i want to simplify the dataframe by counting if 600 and or 3000 occurs and just summerize it in a grouped df – Taiko Apr 01 '21 at 22:39
  • @G.Anderson Thanks for pointing that answer! Good to know – SKPS Apr 01 '21 at 22:47

1 Answers1

3

You can use pd.crosstab to do this:

counts = pd.crosstab(index=[df["Name"], df["ID"]], columns=df["duration"])

# Remove the name of the column array. It throws some people off to look at
counts = counts.rename_axis(columns=None).reset_index()

print(counts)
         3000  600
Name ID           
A    1      1    1
B    2      1    0
C    3      1    1
     4      1    0
D    4      1    1

You can also use pivot_table as another approach:

counts = df.pivot_table(
    index=["Name", "ID"], columns=["duration"], aggfunc="size", fill_value=0
)

counts = counts.rename_axis(columns=None)

print(counts)
         3000  600
Name ID           
A    1      1    1
B    2      1    0
C    3      1    1
     4      1    0
D    4      1    1
Cameron Riddell
  • 10,942
  • 9
  • 19