I have a data frame like this:
id | w1 | w2 | w3 | w4 | w5 | w6 |
---|---|---|---|---|---|---|
11 | light | light | light | light | light | light |
22 | light | light | light | light | medium | medium |
33 | light | light | medium | medium | medium | heavy |
44 | light | light | medium | NA | NA | NA |
55 | light | light | medium | medium | NA | NA |
66 | medium | medium | medium | NA | NA | NA |
I would like to get the frequency count of light, medium, heavy for each id across w1-w6. And I would to get the mode of w1-w6 as a new column.
The target df should look like this:
id | w1 | w2 | w3 | w4 | w5 | w6 | N_light | N_medium | N_heavy | final |
---|---|---|---|---|---|---|---|---|---|---|
11 | light | light | light | light | light | light | 6 | 0 | 0 | light |
22 | light | light | light | light | medium | medium | 4 | 2 | 0 | light |
33 | light | light | medium | medium | medium | heavy | 2 | 3 | 1 | medium |
44 | light | light | medium | NA | NA | NA | 2 | 1 | 0 | light |
55 | light | light | medium | medium | NA | NA | 2 | 2 | 0 | light |
66 | medium | medium | medium | NA | NA | NA | 0 | 3 | 0 | medium |
The real data frame has millions of rows. I struggle to find an efficient way to do this. Any ideas?
I tried the Mode function from DescTools library, that worked with a limited number of rows in a for loop. But it is too slow to run.