I've got a pandas dataframe which I am using the groupby() function on to group things the way I want, except pandas is skipping repeated values, only showing unique values instead.
Here is a sample dataframe
data = [
['American Mathematical Society', 'Journal', 2, 'Mathematics & Statistics'],
['American Mathematical Society', 'Journal', 2, 'Mathematics & Statistics'],
['American Mathematical Society', 'Journal', 38, 'Mathematics & Statistics'],
['American Mathematical Society', 'Journal', 4, 'Mathematics & Statistics']]
df = pd.DataFrame(data, columns = ['Provider', 'Type', 'Downloads JR1 2017', 'Field'])
Now I use the groupby function to group these the way I like in a list.
jr1_provider = df.groupby(['Provider', 'Field', 'Downloads JR1 2017'], as_index=False).sum().values.tolist()
Here is the output:
[['American Mathematical Society', 'Mathematics & Statistics', 2, 'JournalJournal'], ['American Mathematical Society', 'Mathematics & Statistics', 4, 'Journal'], ['American Mathematical Society', 'Mathematics & Statistics', 38, 'Journal']]
However, there should be 4 items in the output. Instead I have only 3. I see that duplicate values have been removed from the results because two of the rows have value '2' in the 'Downloads JR1 2017' column.
Why? And how can I get all results returned?
The output I want to get to would be the name of the 'provider', with a sum of the 'Downloads JR1 2017'. Example:
['American Mathematical Society', 46]