Fill in empty space by merging columns using Pandas Python

Question

I am a Python noob. I have an unstructured text file that I'm trying to capture to a dataframe and export to excel. I need to merge 38 to 36, 45 to 43, and 79 to 78 filling in the empty space with the data on the merging column.

Dummy Dataset

	0	5	36	38	43	45	78	79
1	A	01JUN2022		1.2		B		1.2
2	C	01JUN2022		1.4		D		1.4
3	E	01JUN2022	1.5		F		1.6
4	G	01JUN2022	1.7		H		1.7
5	I	01JUN2022	1.4		J		1.8
6	K	01JUN2022	1.7		L		1.3
1	A	01JUN2022		1.2		B		1.2
2	C	01JUN2022		1.4	D		1.4
3	E	01JUN2022		1.5	F		1.6
4	G	01JUN2022	1.7		H		1.7
5	I	01JUN2022	1.4			J		1.8
6	K	01JUN2022	1.7		L			1.3

Required output

	0	5	36	43	79
1	A	01JUN2022	1.2	B	1.2
2	C	01JUN2022	1.4	D	1.4
3	E	01JUN2022	1.5	F	1.6
4	G	01JUN2022	1.7	H	1.7
5	I	01JUN2022	1.4	J	1.8
6	K	01JUN2022	1.7	L	1.3
1	A	01JUN2022	1.2	B	1.2
2	C	01JUN2022	1.4	D	1.4
3	E	01JUN2022	1.5	F	1.6
4	G	01JUN2022	1.7	H	1.7
5	I	01JUN2022	1.4	J	1.8
6	K	01JUN2022	1.7	L	1.3

Does this answer your question? [How to remove nan value while combining two column in Panda Data frame?](https://stackoverflow.com/questions/34989341/how-to-remove-nan-value-while-combining-two-column-in-panda-data-frame) — T C Molenaar, Oct 05 '22 at 09:19
I tried all the answers in that but I did not get the required output — Dhakshika, Oct 05 '22 at 09:42
Is it guaranteed that only 1 of the 2 columns being merged will have a value present? i.e is it possible for both columns to hold a value for the same row? — Abirbhav G., Oct 05 '22 at 10:45

score 0 · Accepted Answer · answered Oct 05 '22 at 10:59

Would start by converting '' to NaN as follows

df = df.replace(r'^\s*$', np.nan, regex=True)

Then one can use pandas.Series.combine_first

df['36'] = df['36'].combine_first(df['38'])
df['43'] = df['43'].combine_first(df['45'])
df['79'] = df['79'].combine_first(df['78'])

[Out]:
    id  0          5   36   38 43   45   78   79
0    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
1    2  C  01JUN2022  1.4  1.4  D    D  NaN  1.4
2    3  E  01JUN2022  1.5  NaN  F  NaN  1.6  1.6
3    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
4    5  I  01JUN2022  1.4  NaN  J  NaN  1.8  1.8
5    6  K  01JUN2022  1.7  NaN  L  NaN  1.3  1.3
6    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
7    2  C  01JUN2022  1.4  1.4  D  NaN  1.4  1.4
8    3  E  01JUN2022  1.5  1.5  F  NaN  1.6  1.6
9    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
10   5  I  01JUN2022  1.4  NaN  J    J  NaN  1.8
11   6  K  01JUN2022  1.7  NaN  L  NaN  NaN  1.3

Finally, one can drop the columns that one doesn't want or select the one's to display as follows

df = df[['0', '5', '36', '43', '79']]

[Out]:

    0          5   36 43   79
0   A  01JUN2022  1.2  B  1.2
1   C  01JUN2022  1.4  D  1.4
2   E  01JUN2022  1.5  F  1.6
3   G  01JUN2022  1.7  H  1.7
4   I  01JUN2022  1.4  J  1.8
5   K  01JUN2022  1.7  L  1.3
6   A  01JUN2022  1.2  B  1.2
7   C  01JUN2022  1.4  D  1.4
8   E  01JUN2022  1.5  F  1.6
9   G  01JUN2022  1.7  H  1.7
10  I  01JUN2022  1.4  J  1.8
11  K  01JUN2022  1.7  L  1.3

and this gives the desired output.

Notes:

There are additional ways to remove NaN and combine the columns. For that, see: How to remove nan value while combining two column in Panda Data frame?
There are also additional ways to delete columns in Python. If one wants to learn more about it, see: Delete a column from a Pandas DataFrame

Fill in empty space by merging columns using Pandas Python

1 Answers1