For example, I have dataframe as per below.
>>> df
CATX CATY CATZ
0 A G AAA
1 B H BBB
2 C I AAA
3 B J CCC
4 A G BBB
5 B H DDD
6 D K EEE
7 E L FFF
I want to add new columns on the df based on a values provided in a list. For example, for CATZ, i have a list ['AAA', 'BBB']
that I need to consider to indicate that the obervation is 1 or 0 e.g.
>>> df
CATX CATY CATZ AAA BBB
0 A G AAA 1 0
1 B H BBB 0 1
2 A I AAA 1 0
3 B J CCC 0 0
4 A H BBB 0 1
5 B H DDD 0 0
6 D K EEE 0 0
7 E L FFF 0 0
This is a bit different from pd.get_dummies
as get_dummies considers all the possible values (or k-1 values) on your whole dataframe/column. Currently, what I am doing is to loop through the list and execute apply for every row.
for catz_item in catz_list:
df[catz_item] = df.apply(lambda x: 1 if x.CATZ == catz_item else 0, axis=1)
Is there any other way to do this aside from iterating through the list (as this loop is a bit slow). To make it more complicated, I am also doing this using combination of CATX and CATY based on a certain list as well, for example [['A', 'G'], ['A', 'H'], ['B', 'H']].
--- edit ---
output with combination of CATX / CATY
>>> df
CATX CATY CATZ AAA BBB AG AH BH
0 A G AAA 1 0 1 0 0
1 B H BBB 0 1 0 0 1
2 C I AAA 1 0 0 0 0
3 B J CCC 0 0 0 0 0
4 A G BBB 0 1 1 0 0
5 B H DDD 0 0 0 0 1
6 D K EEE 0 0 0 0 0
7 E L FFF 0 0 0 0 0
Code that I am using right now is as per below
catxy_list = [['A', 'G'], ['A', 'H'], ['B', 'H']]
for catxy_item in catxy_list:
df[catxy_item[0] + catxy_item[1]] = df.apply(lambda x: 1 if x.CATX == catxy_item[0] and x.CATY == catxy_item[1] else 0, axis=1)