I have a dataframe called df
that looks similar to this (except the number of 'mat_deliv' columns goes up to mat_deliv_8 and there are several hundred clients - I have simplified it here).
Client_ID mat_deliv_1 mat_deliv_2 mat_deliv_3 mat_deliv_4
C1019876 xxx,yyy,zzz aaa,bbb,xxx ccc ddd
C1018765 yyy,zzz xxx bbb None
C1017654 yyy,xxx aaa,bbb ccc ddd
C1016543 aaa,bbb ccc None None
C1019876 yyy None None None
I want to create a new column called xxx_deliv
with two values, 0
and 1
. I want to set xxx_deliv
to equal 1
if any one of mat_deliv_1
, mat_deliv_2
, mat_deliv_3
or mat_deliv_4
contains xxx
, and 0
if they do not.
So, I want to add a column that, with the example df
, would look like this:
Client_ID mat_deliv_1 mat_deliv_2 mat_deliv_3 mat_deliv_4 xxx_deliv
C1019876 xxx,yyy,zzz aaa,bbb,xxx ccc ddd 1
C1018765 yyy,zzz xxx bbb None 1
C1017654 yyy,xxx aaa,bbb ccc ddd 1
C1016543 aaa,bbb ccc None None 0
C1019876 yyy None None None 0
I know that the following code performs the desired task:
df['xxx_deliv'] = 0
df.loc[(df.Mat_deliv_1.str.contains("xxx", na=False)) |
(df.Mat_deliv_2.str.contains("xxx", na=False)) |
(df.Mat_deliv_3.str.contains("xxx", na=False)) |
(df.Mat_deliv_4.str.contains("xxx", na=False)),
'xxx_deliv'] = 1
But I would like to be able to do this without going through each individual column - I need to be able to search through multiple columns at once.