0

I have a dataframe with lots of key/value columns whereas the keys and values are separated columns.

import pandas as pd

values = [['John', 'somekey1', 'somevalue1', 'somekey2', 'somevalue2']]
df = pd.DataFrame(values, columns=['name', 'key1', 'value1', 'key2', 'value2'])

Remark: The original data would have more preceding columns and not just the name. And it has more than just two key/value columns.

What I want to achieve is having a result like this:

values = [
    ['John', 'somekey1', 'somevalue1'],
    ['John', 'somekey2', 'somevalue2']
]
df = pd.DataFrame(values, columns=['name', 'key', 'value'])

There I was thinking to join all key/value columns into a list or dictionary and than explode that list/dict. I found this nice posting on pd.melt but my problem is, that I don't know the exact id_var columns upfront. Therefore I tried pd.Series.stack, which gave me the correct result for the key/value column, but missing the other columns from the original data. Any idea? Here's what I tried:

# generates: [(somekey1, somevalue1), (somekey2, somevalue2)]
df['pairs'] = df.apply(lambda row: [(row['key1'],row['value1']), (row['key2'], row['value2'])], axis=1)
# unstacks the list, but drops all other columns
df['pairs'].apply(pd.Series).stack().reset_index(drop=True).to_frame('pairs')
Matthias
  • 5,574
  • 8
  • 61
  • 121

2 Answers2

5

IIUC wide_to_long

pd.wide_to_long(df,['key','value'],i='name',j='drop').reset_index().drop('drop',1)
Out[199]: 
   name       key       value
0  John  somekey1  somevalue1
1  John  somekey2  somevalue2
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Never heard of `wide_to_long`. Cool. – DYZ Mar 18 '18 at 23:07
  • 1
    @DyZ yep, this function is not that popular , also another one called `lreshape`:-) – BENY Mar 18 '18 at 23:08
  • Very nice solution. Thanks! Now I understood the difference between wide and long. – Matthias Mar 19 '18 at 16:07
  • @Matthias yep : -) yw~ happy coding – BENY Mar 19 '18 at 16:07
  • @Wen: what if the i-parameter for the original dataframe would be 4 columns? Then I get an error saying 'the id variables need to uniquely identify each row'. If I set those 4-columns as index, then I get the error on melt: '['col1' 'col2' 'col3' 'col4'] not in index' Remark: those four columns are unique. So I don't know why I get the first error – Matthias Mar 26 '18 at 13:41
  • @Matthias I guess you have duplicate combination for col1 to col4 – BENY Mar 26 '18 at 13:54
  • I checked that and it's not :( anyway, I added an ID columns that is basically the same as the index. And now it works. – Matthias Mar 26 '18 at 13:55
  • 1
    @Matthias adding a ID can de done as df.reset_index() – BENY Mar 26 '18 at 13:56
  • That's it. Thanks! – Matthias Mar 26 '18 at 14:00
3

Here's what comes to my mind:

common = ['name'] # Add more columns, if needed
# Alternatively:
common = df.loc[:, :'name'].columns # Everything up to 'name'
result = pd.concat([df.loc[:, common + ['key1', 'value1']],
                    df.loc[:, common + ['key2', 'value2']]])

result['key'] = np.where(result['key1'].isnull(),
                         result['key2'], result['key1'])
result['value'] = np.where(result['value1'].isnull(),
                           result['value2'], result['value1'])
result.drop(['value1', 'value2', 'key1', 'key2'], axis=1, inplace=True)
#   name       key       value
#0  John  somekey1  somevalue1
#0  John  somekey2  somevalue2
DYZ
  • 55,249
  • 10
  • 64
  • 93