>e = {0: pd.Series(['NHL_toronto_maple-leafs_Canada', 'NHL_boston_bruins_US', 'NHL_detroit_red-wings', 'NHL_montreal'])}
>df = pd.DataFrame(e)
>df
0
0 NHL_toronto_maple-leafs_Canada
1 NHL_boston_bruins_US
2 NHL_detroit_red-wings
3 NHL_montreal
I want to:
1) split the above dataframe (Series) by '_'
2) drop the 'NHL' string
3) recombine the remaining text by '_'
4) attach the result in #3 to the original dataframe as the second column
To do this I tried the following:
>df2 = df.icol(0).str.split('_').apply(pd.Series).iloc[:,1:]
>df2
1 2 3
0 toronto maple-leafs Canada
1 boston bruins US
2 detroit red-wings NaN
3 montreal NaN NaN
I tried to follow the suggestion in combine columns in Pandas by doing something like:
>df2['4'] = df2.iloc[:,0] + "_" + df2.iloc[:,1] + "_" + df2.iloc[:,2]
>df2
1 2 3 4
0 toronto maple-leafs Canada toronto_maple-leafs_Canada
1 boston bruins US boston_bruins_US
2 detroit red-wings NaN NaN
3 montreal NaN NaN NaN
However, you can see that in situations where a combine involves a cell that is NaN the end result is NaN as well. This is not what I want.
Column 4 should look like:
toronto_maple-leafs_Canada
boston_bruins_US
detroit_red-wings_US
montreal
Also is there an efficient way to do this type of operation as my real data set is quite large.