i have this dataframe that have row of each key*id , i want to explode it to id,key1,key2 and remove duplicate rows and keep data_field , i am working with python2.7 but i would glad to a solution that will work both for python2.7 and python3.7
dataframe i have:
import pandas as pd
d = {'id': [111, 222, 222, 333, 333], 'key': ['key1', 'key2','key1','key2','key1'], 'value':[1,1,2,3,3],'data_field':['dummy1','dummy1','dummy2','dummy3','dummy2']}
df = pd.DataFrame(data=d)
print df[['id','key','value','data_field']].to_string(index=False)
id key value data_field
111 key1 1 dummy1
222 key2 1 dummy1
222 key1 2 dummy2
333 key2 3 dummy3
333 key1 3 dummy2
dataframe i want it to be transformed to:
d = {'id': [111, 222, 333], 'key1': [1, 2, 3],'key2':[pd.np.nan,1,3] , 'data_field': ['dummy1', 'dummy2', 'dummy3']}
df = pd.DataFrame(data=d)
print df[['id', 'key1', 'key2', 'data_field']].to_string(index=False)
id key1 key2 data_field
111 1 NaN dummy1
222 2 1.0 dummy2
333 3 3.0 dummy3
tried as suggested here
df.pivot(index='id', columns='key', values='value').join(df.drop_duplicates('id')['data_field'])
and got :
key1 key2 data_field
id
111 1.0 NaN NaN
222 2.0 1.0 NaN
333 3.0 3.0 NaN
data_field was not kept and id is now index and not column