1

I used the value_counts to see how many times an id occurs in my DataFrame. Is it also possible to access the values of these counts for a certain column with the same ids. For example:

   colA    colB  colC
1. banana, 50,   60 
2. apple, 30,   70 
2. apple, 20,   80
2. lemon, 30,   90
2. banana, 25,   10
2. lemon, 50,   15
2. apple, 5,   85

banana['colB']: [50, 25]
apple['colB']: [30, 20, 5]
etc...

I don't know what the ids are in colA so they have to be variable... I managed to do it in a loop using the df.loc but this is very time consuming as the dataframe is very large.

1 Answers1

1

I think in python it is possible only ugly ways, best is use another strucures like Series or DataFrames or dictionaries, link.

Close, what you need is convert column to index and then select by index nad columns labels by DataFrame.loc:

df1 = df.set_index('colA')

And then:

a = df1.loc['banana','colB'].tolist()

b = df1.loc['apple','colB'].tolist()

For all values is possible create Series of lists :

s = df.groupby('colA')['colB'].agg(list)

And then:

a = s['banana']

Or for dictionary of lists use:

d = df.groupby('colA')['colB'].agg(list).to_dict()

a = d['banana']
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252