How to create columns from rows given key:value pair in the column in pandas?

Question

I have the DF of this kind:

pd.DataFrame({'label':['A','test1: A','test2: A','B','test1: B','test3: B'],
             'value': [1,2,3,4,5,6]})

      label  value
0         A      1
1  test1: A      2
2  test2: A      3
3         B      4
4  test1: B      5
5  test3: B      6

And I need to convert to this:

pd.DataFrame({'label':['A','B'],
             'value': [1,4],
             'test1:':[2,5],
             'test2:':[3,None],
             'test3:':[None,6]})

  label  value  test1:  test2:  test3:
0     A      1       2     3.0     NaN
1     B      4       5     NaN     6.0

I need to keep label for unique value and keys are merged to the right if present in the data. Keys may vary and be of different names for one value.

Feel free to share how to rename the question because I could not find the better way to name the problem.

EDIT: Partly this solution contains what I need however there is no decent way to add columns representing key in the label column. Ideally something like a function with df input is needed.

@SurajSubramanian really nothing worked out. Thought about lambda function with apply, however, have no clue how to construct the result:( — sumixam, Jul 12 '20 at 14:40
Does this answer your question? [Pandas long to wide reshape, by two variables](https://stackoverflow.com/questions/22798934/pandas-long-to-wide-reshape-by-two-variables) — NelsonGon, Jul 12 '20 at 14:40
@NelsonGon the link you provided contains mostly the same problem as I have really, but it uses for loop for naming and creating new columns which is not efficient in case of larger data — sumixam, Jul 12 '20 at 14:55
Right, I normally link duplicates to enable one see if it solves their issues. Won't retract. Feel free to add your comment to your question so someone else knows why this may not work for you. — NelsonGon, Jul 12 '20 at 14:57

score 2 · Accepted Answer · answered Jul 12 '20 at 15:29

2

Extract information into two data frames and merge them.

df2 = df[df['label'].str.contains('test')]
df3 = df2['label'].str.split(expand=True).rename(columns={0: "test", 1: "label"})
df3['value'] = df2['value']
df3 = df3.pivot_table(index='label', columns='test', values='value')
df2 = df[~df['label'].str.contains('test')]
df4 = pd.merge(df2, df3, on='label')

Output

  label  value  test1:  test2:  test3:
0     A      1     2.0     3.0     NaN
1     B      4     5.0     NaN     6.0

answered Jul 12 '20 at 15:29

Pramote Kuacharoen

1,496
1
5
6

This makes sense completely and avoids loops. Thanks! – sumixam Jul 12 '20 at 15:49
@sumixam .contains is a loop most likely, so is pivot_table method – nexla Jul 13 '20 at 12:44

Roy2012 · Answer 2 · 2020-07-12T15:05:55.190

1

Here's a way to do that:

df.loc[~df.label.str.contains(":"), "label"] = df.loc[~df.label.str.contains(":"), "label"].str.replace(r"(^.*$)", r"value:\1")
labels = df.label.str.split(":", expand = True).rename(columns = {0: "label1", 1:"label2"})
df = pd.concat([df, labels], axis=1)
df = pd.pivot_table(df, index="label2", columns="label1", dropna=False)
df.columns = [c[1] for c in df.columns]
df.index.name = "label"

The output is:

       test1  test2  test3  value
label                            
A        2.0    3.0    NaN    1.0
B        5.0    NaN    6.0    4.0

edited Jul 12 '20 at 15:05

answered Jul 12 '20 at 14:49

Roy2012

11,755
2
22
35

I have a bit different out with val columns concatenated below. Also, do you know how to remove the label1,label2? There is some truth in your solution:) – sumixam Jul 12 '20 at 15:02
Slightly improved - changed the column names to drop the extra level. – Roy2012 Jul 12 '20 at 15:06
See the edits. This all makes sense, however, produces a bit different output in pivoting i think – sumixam Jul 12 '20 at 15:53

How to create columns from rows given key:value pair in the column in pandas?

2 Answers2