Fill different columns based on other columns (first one with column header, second one with the value to obtain)

Question

I'm working with Python 3.6.9.

I'm stuck on a dataframe like that :

import pandas as pd
import numpy as np

dict_ = {'col1': [3.14, 28, -0.618, 1.159], 'col2': ['a_002_u', 'a_003_u', 'a_001_u', 'a_003_u'], 'a_001_u': [np.nan] * 4, 'a_002_u': [np.nan] * 4, 'a_003_u': [np.nan] * 4}

df = pd.DataFrame(dict_)

     col1     col2  a_001_u  a_002_u  a_003_u
0   3.140  a_002_u      NaN      NaN      NaN
1  28.000  a_003_u      NaN      NaN      NaN
2  -0.618  a_001_u      NaN      NaN      NaN
3   1.159  a_003_u      NaN      NaN      NaN

And I would like to get this result :

     col1     col2  a_001_u  a_002_u  a_003_u
0   3.140  a_002_u      NaN     3.14      NaN
1  28.000  a_003_u      NaN      NaN   28.000
2  -0.618  a_001_u   -0.618      NaN      NaN
3   1.159  a_003_u      NaN      NaN    1.159

In other words, I would like to fill columns 'a_001_u', 'a_002_u', and 'a_003_u' with 'col1' values based on column header in 'col2'.

It is quite easy to explain, but I have the impression that it is less obvious to set up. Does anyone have an idea to help me?

Ben.T · Accepted Answer · 2020-05-03T18:55:09.400

2

you can use fillna after change the shape of the first 2 column with set_index and unstack like:

df = df.fillna(df.set_index('col2', append=True)['col1'].unstack())
print (df)
     col1     col2  a_001_u  a_002_u  a_003_u
0   3.140  a_002_u      NaN     3.14      NaN
1  28.000  a_003_u      NaN      NaN   28.000
2  -0.618  a_001_u   -0.618      NaN      NaN
3   1.159  a_003_u      NaN      NaN    1.159

because actually doing the set_index and unstack does create the other columns you want, and the fillna will do a (row,column) filling of missing value

print(df.set_index('col2', append=True)['col1'].unstack())
col2  a_001_u  a_002_u  a_003_u
0         NaN     3.14      NaN
1         NaN      NaN   28.000
2      -0.618      NaN      NaN
3         NaN      NaN    1.159

NB: Same result could be obtain with pivot like df.pivot(columns='col2', values='col1')

edited May 03 '20 at 18:55

answered May 03 '20 at 18:39

Ben.T

29,160
6
32
54

1

This solution works perfectly. And that's exactly what I was looking for. I did not know the association "set_index" and "unstack". Thanks a lot ! @Ben.T – David May 03 '20 at 18:49
2

@David it is equivalent to a pivot, you could get the same result with `df.pivot(columns='col2', values='col1')`, which could be faster actually ;) – Ben.T May 03 '20 at 18:54
2

Totally agree with @Ben.T , pivot is a good tool to use in this case given you have a a large number of rows. – May 03 '20 at 19:01

score 0 · Answer 2 · 2020-05-03T19:00:11.160

0

You can write it by iterating through the rows.

for index, row in df.iterrows():
    row[row['col2']]=row['col1']

edited May 03 '20 at 19:00

answered May 03 '20 at 18:27

I have tens of thousands of data frames, and each one contains several thousand rows. So I'm looking for (if possible) a solution by vector calculation to be as fast as possible. I'm afraid that with a loop for it would take too long... Thanks for your help @overflow95 – David May 03 '20 at 18:43
1

I get an error 'TypeError: cannot do label indexing on with these indexers [3.14] of ' – Sowmya May 03 '20 at 18:51
Hi @Sowmya, thanks for noticing. It should be `row['col1']` instead of `row[row['col1']]` – May 03 '20 at 19:00
@overflow95 You are welcome. I was wondering why the loop should take long? – Sowmya May 03 '20 at 19:13
1

@Sowmya for more information about `iterrows` vs `vectorization`, read this [answer](https://stackoverflow.com/a/55557758/9274732), it is really good :) – Ben.T May 03 '20 at 19:31

Sowmya · Answer 3 · 2020-05-03T19:11:55.717

0

import pandas as pd
import numpy as np

dict_ = {'col1': [3.14, 28, -0.618, 1.159], 'col2': ['a_002_u', 'a_003_u', 'a_001_u', 'a_003_u'], 'a_001_u': [np.nan] * 4, 'a_002_u': [np.nan] * 4, 'a_003_u': [np.nan] * 4}

df = pd.DataFrame(dict_)
count = 0
for key in df['col2']:
    df[key][count] = df['col1'][count]
    count += 1

df

edited May 03 '20 at 19:11

answered May 03 '20 at 18:45

Sowmya

91
1
9

Fill different columns based on other columns (first one with column header, second one with the value to obtain)

3 Answers3