Suppose the following dataframe:
c1 c2
0 0 "phrase_x 12"
1 0 "phrase_y 34"
2 1 "phrase_x 56"
3 1 "phrase_y 78"
I want to make four new columns, c3-c6, based on conditions from c1 and c2, such that:
- if c1 == 0 and c2 contains phrase_x, c3 is filled with data from c2, split with a separator string
- if c1 == 0 and c2 contains phrase_y, c4 is filled with data from c2, split with a separator string
- if c1 == 1 and c2 contains phrase_x, c5 is filled with data from c2, split with a separator string
- if c1 == 1 and c2 contains phrase_y, c6 is filled with data from c2, split with a separator string
- The empty cells may be filled with NaN
Something like this:
c1 c2 c3 c4 c5 c6
0 0 "phrase_x 12" 12 NaN NaN NaN
1 0 "phrase_y 34" NaN 34 NaN NaN
2 1 "phrase_x 56" NaN NaN 56 NaN
3 1 "phrase_y 78" NaN NaN NaN 78
I have looked at this question, which helped me further a bit, but only if I want to use 1 condition, like this:
df.loc[df['c2'].str.contains("phrase_x") , 'c3'] = df['c2'].str.split('-> ').str[1]
df.loc[df['c2'].str.contains("phrase_y") , 'c4'] = df['c2'].str.split('-> ').str[1]
This code produces
c1 c2 c3 c4
0 0 "phrase_x 12" 12 NaN
1 0 "phrase_y 34" NaN 34
2 1 "phrase_x 56" 56 NaN
3 1 "phrase_y 78" NaN 78
Is there a way to use loc with multiple conditions? It seems to me that a possibility is to first use the above 2 statements, and then copy the data over based on if c1 is 0 or 1, like this:
df.loc[df['c1'] = 1, 'c5'] = df['c3']
df.loc[df['c1'] = 1, 'c6'] = df['c4']
However this seems tedious, and you would also need to delete the data in the previous columns.
I also tried the solution under the previous mentioned one, with np.where():
df['c3'] = df.where(df['c1'] == 0 & df['c2'].str.contains("phrase_x"), df['c2'].str.split('-> ').str[1], axis = 0)
However this only leads to more errors, e.g.
ValueError: Cannot set a DataFrame with multiple columns to the single column Budget Outlet 1