Python Pandas - Filter by multiple column names

Question

I apologize, I looked through many other StackOverflow threads and did not find one that was what I'm looking for.

I am trying to use pandas to solve this but I don't necessarily need a pandas solution. I have created a pandas dataframe:

df = pd.DataFrame({'Provider' : ['Prov1', 'Prov1', 'Prov1', 'Prov1', 'Prov2', 'Prov2', 'Prov2'],
               'Field' : ['Chemistry', 'Biology', 'Chemistry', 'Biology', 'Biology', 'Engineering', 'Biology'],
               'Downloads' : [10, 12, 10, 8, 4, 20, 14]})

What I want to do is basically filter by multiple columns. The pseudo code for this would be:

#if provider = 'Prov1':
    #for field in provider:
        #sum downloads for these criteria

Ideally my output would be a tuple, something like (Chemistry, 20), (Biology, 20).

I've tried various methods using pandas.loc, pandas.groupby but these all seem to allow one level of filtering. Meaning, I can choose one column to filter by. But I don't know how to filter by multiple levels.

Did you try `df.groupby(['Provider', 'Field'], as_index=False).sum()`? — cs95, Jun 04 '19 at 18:43

niraj · Accepted Answer · 2019-06-04T18:49:59.580

1

You can use filter by df[df['Provider']=='Prov1'] followed by groupby and sum:

result = df[df['Provider']=='Prov1'].groupby('Field', as_index=False).sum().values.tolist()

The result is list of lists:

[['Biology', 20], ['Chemistry', 20]]

edited Jun 04 '19 at 18:49

answered Jun 04 '19 at 18:44

niraj

17,498
4
33
48

Python Pandas - Filter by multiple column names

1 Answers1