2

I am trying to built a dataframe from one existing dataframe looking like that, I now know that the pivot method can do the trick but I am wondering why it would not work referencing the 'obs' column for the 'Quantity' column in a new df. Feel like it could be useful to know in some situation in addition to pivot.

       Id  Quantity  obs
1  100121     113.0    0
2  100121     104.0    1
3  100121     -11.0    2
4  100122     220.0    0
5  100122     167.0    1
6  100122     100.0    2

I want to achieve the following result:

         Id  m1   m2    m3
1    100121 -11  104.0  113
4    100122  100 167    220

I tried doing this:

df_histo = pd.DataFrame(data11['Id'].drop_duplicates(keep='first'))
df_histo['m1'] = data11['Quantity'].loc[data11['obs']==2]
df_histo['m2'] = data11['Quantity'].loc[data11['obs']==1]
df_histo['m3'] = data11['Quantity'].loc[data11['obs']==0]

for some reason I keep getting this output:

         Id  m1   m2    m3
1    100121  NaN  NaN   NaN
4    100122  NaN  NaN   NaN

What am I missing out here? I can't see where I am messing up

Murcielago
  • 905
  • 1
  • 8
  • 30

1 Answers1

2

IIUC,

df.pivot('Id','obs','Quantity').rename(columns=lambda x: f'm{x+1}')

Output:

obs        m1     m2     m3
Id                         
100121  113.0  104.0  -11.0
100122  220.0  167.0  100.0

You are missing the concept of intrinsic data alignment, meaning that pandas wants to align data using indexes.

To fix your code you need to strip the indexing from the right side of your statements converting the pd.Series into a 1D numpy array.

df_histo = pd.DataFrame(data11['Id'].drop_duplicates(keep='first'))
df_histo['m1'] = data11['Quantity'].loc[data11['obs']==2].to_numpy()
df_histo['m2'] = data11['Quantity'].loc[data11['obs']==1].to_numpy()
df_histo['m3'] = data11['Quantity'].loc[data11['obs']==0].to_numpy()

Output:

       Id     m1     m2     m3
1  100121  -11.0  104.0  113.0
4  100122  100.0  167.0  220.0

However, this is still not the best way to do this indexing and selection, a better way is like this:

df_histo['m1'] = data11.loc[data11['obs'] == 2, 'Quantity'].to_numpy()
df_histo['m2'] = data11.loc[data11['obs'] == 1, 'Quantity'].to_numpy()
df_histo['m3'] = data11.loc[data11['obs'] == 0, 'Quantity'].to_numpy()
df_histo

Output:

       Id     m1     m2     m3
1  100121  -11.0  104.0  113.0
4  100122  100.0  167.0  220.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • that works, thank you I didnt know about the pivot method! I only knew about transpose. Now, could you still enlight me about what is wrong in my code? – Murcielago Aug 11 '20 at 21:01