0

How can I reshape a pandas dataframe into a numpy array, i.e. make one list item for each discrete value of foo for all the bars belonging to that entry but without manually iterating (vectorized)?

import pandas as pd
d = pd.DataFrame({'foo':[1,1,1,2,2,2], 'bar':[1,2,3,4,5,6]})
display(d)

of the following structure

result = [[1,2,3], [4,5,6]]
result

enter image description here

Georg Heiler
  • 16,916
  • 36
  • 162
  • 292

2 Answers2

1

Use a DataFrame.groupby, then convert to whatever structure you want

df = pd.DataFrame({'foo': [1, 1, 1, 2, 2, 2, 2], 'bar': [1, 2, 3, 4, 5, 6, 7]})

print(df.groupby('foo')['bar'].apply(list).to_list())
# python nested lists : [[1, 2, 3], [4, 5, 6, 7]]

print(df.groupby('foo')['bar'].apply(np.array).to_numpy())
# numpy ndarray:  [array([1, 2, 3], dtype=int64) array([4, 5, 6, 7], dtype=int64)]
azro
  • 53,056
  • 7
  • 34
  • 70
0
# all have the same size
number_of_items_per_group = d.groupby(['foo']).bar.size().max()
d.bar.values.reshape(d.foo.nunique(),number_of_items_per_group)
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292