3

I have a dataframe formatted like this in pandas.

(df)
School ID      Column 1 
School 1       AD6000         
School 2       3000TO4000      
School 3       5000TO6000      
School 4       AC2000         
School 5       BB3300        
School 6       9000TO9900      
....

All I want to do is split column 1 rows that have the word 'TO' in it as a delimiter into two new columns while leaving the original. The result would be this.

(df)
School ID      Column 1          Column 2     Column 3
School 1       AD6000            NaN          NaN
School 2       3000TO4000        3000         4000
School 3       5000TO6000        5000         6000
School 4       AC2000            NaN          NaN
School 5       BB3300            NaN          NaN
School 6       9000TO9900        9000         9900
....

Here's the code I have that I thought works, but it turns out it is leaving blanks in columns 2 and 3 instead of splitting the numbers to the left and right of TO into their respective columns.

df[['Column 2','Column 3']] = df['Column 1'].str.extract(r'(\d+)TO(\d+)')

Thanks for the help.

2 Answers2

2

That's because the right hand side is a dataframe with different column names (0, 1) and Pandas couldn't find Column 2 or Column 3 in that dataframe.

You can pass the underlying numpy array instead of the dataframe:

df[['Column 2','Column 3']] = df['Column 1'].str.extract(r'(\d+)TO(\d+)').values

Output:

  School ID    Column 1 Column 2 Column 3
0  School 1      AD6000      NaN      NaN
1  School 2  3000TO4000     3000     4000
2  School 3  5000TO6000     5000     6000
3  School 4      AC2000      NaN      NaN
4  School 5      BB3300      NaN      NaN
5  School 6  9000TO9900     9000     9900
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • With this I get an error saying ""None of [Index(['Column 1', 'Column 2], dtype='object', name=3)] are in the [columns]"" – Derek Fisher Oct 27 '20 at 20:05
  • This works on my system with Pandas 1.1.5. Also related to [this question](https://stackoverflow.com/questions/39050539/how-to-add-multiple-columns-to-pandas-dataframe-in-one-assignment#:~:text=%20How%20to%20add%20multiple%20columns%20to%20pandas,but%20the%20new%20columns%20will%20be...%20More%20). – Quang Hoang Oct 27 '20 at 20:13
0

Use

new = df["Column 1"].str.split("TO", n = 1, expand = True)

And give the resulting columns new names

df["Col1"]= new[0] 
df["Col2"]= new[1]