0

i have a data set which looks like this:

yg = pd.DataFrame({'grade': ['a','a','b','b','a'],
                  'key2': ['one','two','one','two','one'], 
                  'year': (2012,2013,2012,2012,2013),
                  'id': (1101,2212,2331,2432,3464)})

which counts the number of users in each year by categories

yg.groupby(['year','grade']).groups.count()

Also, this is my work: I am trying to build a function,

    def User_Grades(data,year):
        g = data.groupby(['year']).get_group(year).groupby(['grade']).size[['a','b']]
    for i in df_groupby(['year']).groups.key():
        print('{}\n'.format(i), 'a:  {}\n'.format(User_Grades(df,i)['a'],'b:  {}\n'.format(User_Grades(df,i)['b'])))

I would like to input the year, so i could have the information of that year, not all years. for example,

    User_Grades(yg,['2012'])
    # I would have
    2012
    a :  2
    b :  2

Note: I received some advice about using pivot in python. However, the output of pivot is different with the expected answer. There is no ':' in pivot.

Pivot gives below output:

YEAR  GRADE         
2012  a                     2
      b                     2
2013  a                     1
      b                     0

This format from pivot is not expected, instead I need this:

2012
a :  2
b :  2
2013
a :  1
b :  0
user8964444
  • 35
  • 1
  • 9
  • Also see https://stackoverflow.com/questions/47372181/pandas-groupby-how-to-show-zero-counts-in-dataframe, I just answered this question a few minutes ago. – cs95 Nov 19 '17 at 00:09
  • @Brad Solomon, I don't think pivot can solve my question accurately. There are obvious difference between the output. I still believe groupby() is a better way to solve, but I am still puzzled. – user8964444 Nov 19 '17 at 02:35
  • @cᴏʟᴅsᴘᴇᴇᴅ it's not a duplicate question. Pivot cannot solve this question accurately. Groupby() does but I am still puzzled – user8964444 Nov 19 '17 at 02:37
  • @user8964444 can you give a little more info on what exactly the output is? You mention "get a format like this" -- that doesn't really look like a DataFrame format. If your question is about more than just including the 0-counts, then yes that would be a reason to re-open, I think. – Brad Solomon Nov 19 '17 at 02:38
  • In other words -- how do you want your output to be different from `yg.pivot_table(index='grade', columns='year', values='id',fill_value=0, aggfunc='count').unstack()`? Because you can just `.loc[2012]` on that result. – Brad Solomon Nov 19 '17 at 02:40
  • And how can i or anyone could remove the duplicate tag of this question? I am new to this platform. Sorry! – user8964444 Nov 19 '17 at 02:49
  • You cannot unfortunately, neither can I, only the almighty such as @cᴏʟᴅsᴘᴇᴇᴅ can. But it might help if you can address what I was asking above. – Brad Solomon Nov 19 '17 at 02:52
  • I edited my question again, hope it's more clear. i showed my work, just stuck in the function – user8964444 Nov 19 '17 at 02:58
  • @user8964444 don’t see the pivot solution. See the reindex one below. Also see the link I posted. – cs95 Nov 19 '17 at 03:30

0 Answers0