How do I get the list in another format?

Question

I have a data frame where I only want values that contain a certain value. I've already implemented that. What I want now is the list grouped by user. What I get is every single element of the data frame in a list. How do I get this list [[User1.item1, ..., user1, itemn], ..., [Usern.item1, ..., usern, itemn]]

d = {'userid': [0, 0, 0, 1, 2, 2, 3, 3, 4, 4, 4],
     'itemid': [715, 845, 98, 12324, 85, 715, 2112, 85, 2112, 852, 102]}
df = pd.DataFrame(data=d)
print(df)

users = df.loc[df.itemid == 715, "userid"]
df_new = df.loc[df.userid.isin(users)]

list_new = df_new[['itemid']].values.tolist()
# What I get
[[715],[845],[98],[85],[715]]
# What I want
[[715,845,98],[85,715]]

score 3 · Answer 1 · answered Dec 12 '20 at 09:56

You may use a groupby operation

list_new = df_new.groupby("userid")['itemid'].apply(list).tolist()
print(list_new)  # [[715, 845, 98], [85, 715]]

The intermediate operation is

list_new = df_new.groupby("userid")['itemid'].apply(list)
print(list_new)  

userid
0    [715, 845, 98]
2         [85, 715]
Name: itemid, dtype: object

David Erickson · Accepted Answer · 2020-12-12T10:18:47.803

If you want to do all of your code in one line, you can use list comprehension:

[x for x in [*df.groupby('userid')['itemid'].apply(list)] if 715 in x]

[[715, 845, 98], [85, 715]]

The code:

[*df.groupby('userid')['itemid'].apply(list)]

is equivalent to

df_new.groupby("userid")['itemid'].apply(list).tolist()

and the remaining part is just looping through what is generated from that master list ^^^ to see if 715 is in any of the sublists, where x is the sublists in the code above.

score 0 · Answer 3 · answered Dec 12 '20 at 12:53

1.We need to group our data by user id. Grouping is very important in many applications, such as in field of Machine learning preprocessing: Example: Suppose our data is collected from sensors at various stations which are located at various parts of a state. Suppose we are measuring pressure and temperature. Suppose for our understanding let there be three stations Station-1, Station-2 and Station-3. In many practical scenarios we may have missing values in our data. If we use entire data to fill missing values, we may not get good results. But if we only use it's station's data to fill missing values we can get good results(Since conditions are different at different stations. But it is similar at particular station).

ans = df.groupby('userid')['itemid'].apply(list)

userid
0      [715, 845, 98]
1             [12324]
2           [85, 715]
3          [2112, 85]
4    [2112, 852, 102]
Name: itemid, dtype: object

Each row gives each user's all itemid's

How do I get the list in another format?

3 Answers3