0

The following below is a Pandas series (user and item are indices in this case)

It was achieved from the original df by doing df.groupby(["user, "item"])["list"].apply(lambda x: x.tolist()]

user        item        list
2           a           [alpha, alpha]
            b           [alpha, alpha, beta, beta, gamma]
3           c           [alpha, theta]
            d           [alpha, pi, pi]
1           e           [rho, zeta].

My questions is: how do I convert this to another Series and/or dataframe in such a way to be have a counter on the list? (As an added requirement, I only want the top 2 items among each list).

user        item        list
2           a           {alpha: 2}
            b           {alpha: 2, beta: 2}
3           c           {alpha: 1, theta: 1}
            d           {alpha: 1, pi: 2}
1           e           {rho: 1, zeta: 1}

I also may want percentages for each item, but based on the ORIGINAL list:

user        item        list
2           a           {alpha: 100}
            b           {alpha: 40, beta: 40}
3           c           {alpha: 50, theta: 50}
            d           {alpha: 33.3, pi: 66.6}
1           e           {rho: 50, zeta: 50}
Erfan
  • 40,971
  • 8
  • 66
  • 78
hedebyhedge
  • 445
  • 6
  • 13
  • 1
    Pandas is not designed to use it like this with lists and dicts as values. Seems like you've fallen in the [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Take a step back and explain what your actual problem is. From the original dataframe, not the grouped by one. – Erfan Jul 23 '19 at 00:12
  • 1
    `df['column'].apply(collections.Counter)` . Awful, but that's what you want. I'd definitely take @Erfan suggestion, though. – rafaelc Jul 23 '19 at 00:15
  • Maybe you're looking for something like this: `df.groupby(["user", "item"])["list"].value_counts()` – Nakor Jul 23 '19 at 00:20

0 Answers0