I have a pandas DataFrame containing a string column which needs splitting into two separate columns. The answer using tolist
that I found on SO from this question works like a charm, except when my column contains NaNs. The excerpt below describes the difficulty:
import pandas as pd
import numpy as np
# Example DataFrame
df = pd.DataFrame([[25.0, '34.2/ 18.1', 'one'],
[32.6, '28.6/ 17.9', 'two'],
[12.5, '30.1/ 17.6', 'three']], columns=['A', 'B', 'C'])
df2 = df.copy()
# This method works when all data are present
df['D'] = pd.DataFrame(df['B'].str.split('/').tolist())[1]
# However, when there are NaNs:
df2['B'][0] = np.nan
# This line fails
df2['D'] = pd.DataFrame(df2['B'].str.split('/').tolist())[1]
It gives me a KeyError
, because the intermediate DataFrame only has one column, indicating that the bother of going to a list and back doesn't accomplish anything anymore:
0
0 NaN
1 [28.6, 17.9]
2 [30.1, 17.6]
I've tried dropping the NaN first via pd.DataFrame(df2['B'].str.split('/').dropna().tolist())
, but then I lose my index ... I need to keep the NaN at index 0. I've also thought of somehow duplicating the NaN in the creation of the intermediate DataFrame to force the two columns, but am having no luck.
This is what I would need my data to look like for df2:
A B C D
0 25.0 NaN one NaN
1 32.6 28.6/ 17.9 two 17.9
2 12.5 30.1/ 17.6 three 17.6
Is there a way to do this without using a list as an intermediary? Or somehow deal with the NaN?