I have the following data frame:
import pandas as pd
df = pd.DataFrame({'member': ['john', 'john', 'john', 'jake', 'jake', 'jake', 'jim', 'jim', 'jim'],
'age': [42, 43, 44, 35, 36, 37, 57, 58, 59],
'inpatient_count': [0, 1, 2, 1, 0, 0, 2, 1, 5],
'pcp_count': [4, 0, 6, 0, 3, 3, 0, 5, 2]})
df = df.sort_values('member')
print(df)
member age inpatient_count pcp_count
3 jake 35 1 0
4 jake 36 0 3
5 jake 37 0 3
6 jim 57 2 0
7 jim 58 1 5
8 jim 59 5 2
0 john 42 0 4
1 john 43 1 0
2 john 44 2 6
I would like to transform df
into arrays that are grouped/nested by member
, as done below, but I would like for something much faster when running over millions of members. I was hoping pd.to_numpy()
would have a grouper argument, but I haven't figured it out yet.
import numpy as np
keep = [x for x in df.columns if x != 'member']
np.array(df.groupby('member')[keep].apply(lambda x: x.values.tolist()).tolist())
array([[[35, 1, 0],
[36, 0, 3],
[37, 0, 3]],
[[57, 2, 0],
[58, 1, 5],
[59, 5, 2]],
[[42, 0, 4],
[43, 1, 0],
[44, 2, 6]]])