0

I apologize, I looked through many other StackOverflow threads and did not find one that was what I'm looking for.

I am trying to use pandas to solve this but I don't necessarily need a pandas solution. I have created a pandas dataframe:

df = pd.DataFrame({'Provider' : ['Prov1', 'Prov1', 'Prov1', 'Prov1', 'Prov2', 'Prov2', 'Prov2'],
               'Field' : ['Chemistry', 'Biology', 'Chemistry', 'Biology', 'Biology', 'Engineering', 'Biology'],
               'Downloads' : [10, 12, 10, 8, 4, 20, 14]})

What I want to do is basically filter by multiple columns. The pseudo code for this would be:

#if provider = 'Prov1':
    #for field in provider:
        #sum downloads for these criteria

Ideally my output would be a tuple, something like (Chemistry, 20), (Biology, 20).

I've tried various methods using pandas.loc, pandas.groupby but these all seem to allow one level of filtering. Meaning, I can choose one column to filter by. But I don't know how to filter by multiple levels.

Erich Purpur
  • 1,337
  • 3
  • 14
  • 34

1 Answers1

1

You can use filter by df[df['Provider']=='Prov1'] followed by groupby and sum:

result = df[df['Provider']=='Prov1'].groupby('Field', as_index=False).sum().values.tolist()

The result is list of lists:

[['Biology', 20], ['Chemistry', 20]]
niraj
  • 17,498
  • 4
  • 33
  • 48