Pandas: join multiple columns of one row to many rows (1:n)

Question

I have a dataframe with lots of key/value columns whereas the keys and values are separated columns.

import pandas as pd

values = [['John', 'somekey1', 'somevalue1', 'somekey2', 'somevalue2']]
df = pd.DataFrame(values, columns=['name', 'key1', 'value1', 'key2', 'value2'])

Remark: The original data would have more preceding columns and not just the name. And it has more than just two key/value columns.

What I want to achieve is having a result like this:

values = [
    ['John', 'somekey1', 'somevalue1'],
    ['John', 'somekey2', 'somevalue2']
]
df = pd.DataFrame(values, columns=['name', 'key', 'value'])

There I was thinking to join all key/value columns into a list or dictionary and than explode that list/dict. I found this nice posting on pd.melt but my problem is, that I don't know the exact id_var columns upfront. Therefore I tried pd.Series.stack, which gave me the correct result for the key/value column, but missing the other columns from the original data. Any idea? Here's what I tried:

# generates: [(somekey1, somevalue1), (somekey2, somevalue2)]
df['pairs'] = df.apply(lambda row: [(row['key1'],row['value1']), (row['key2'], row['value2'])], axis=1)
# unstacks the list, but drops all other columns
df['pairs'].apply(pd.Series).stack().reset_index(drop=True).to_frame('pairs')

score 5 · Accepted Answer · answered Mar 18 '18 at 23:06

5

IIUC wide_to_long

pd.wide_to_long(df,['key','value'],i='name',j='drop').reset_index().drop('drop',1)
Out[199]: 
   name       key       value
0  John  somekey1  somevalue1
1  John  somekey2  somevalue2

answered Mar 18 '18 at 23:06

BENY

317,841
20
164
234

Never heard of `wide_to_long`. Cool. – DYZ Mar 18 '18 at 23:07
1

@DyZ yep, this function is not that popular , also another one called `lreshape`:-) – BENY Mar 18 '18 at 23:08
Very nice solution. Thanks! Now I understood the difference between wide and long. – Matthias Mar 19 '18 at 16:07
@Matthias yep : -) yw~ happy coding – BENY Mar 19 '18 at 16:07
@Wen: what if the i-parameter for the original dataframe would be 4 columns? Then I get an error saying 'the id variables need to uniquely identify each row'. If I set those 4-columns as index, then I get the error on melt: '['col1' 'col2' 'col3' 'col4'] not in index' Remark: those four columns are unique. So I don't know why I get the first error – Matthias Mar 26 '18 at 13:41
@Matthias I guess you have duplicate combination for col1 to col4 – BENY Mar 26 '18 at 13:54
I checked that and it's not :( anyway, I added an ID columns that is basically the same as the index. And now it works. – Matthias Mar 26 '18 at 13:55
1

@Matthias adding a ID can de done as df.reset_index() – BENY Mar 26 '18 at 13:56
That's it. Thanks! – Matthias Mar 26 '18 at 14:00

score 3 · Answer 2 · answered Mar 18 '18 at 23:06

Here's what comes to my mind:

common = ['name'] # Add more columns, if needed
# Alternatively:
common = df.loc[:, :'name'].columns # Everything up to 'name'
result = pd.concat([df.loc[:, common + ['key1', 'value1']],
                    df.loc[:, common + ['key2', 'value2']]])

result['key'] = np.where(result['key1'].isnull(),
                         result['key2'], result['key1'])
result['value'] = np.where(result['value1'].isnull(),
                           result['value2'], result['value1'])
result.drop(['value1', 'value2', 'key1', 'key2'], axis=1, inplace=True)
#   name       key       value
#0  John  somekey1  somevalue1
#0  John  somekey2  somevalue2

Pandas: join multiple columns of one row to many rows (1:n)

2 Answers2