My question is simple - I have the following table:
+----------+-------+------------+--------+
| industry | class | occupation | value |
+----------+-------+------------+--------+
| 170 | 4 | 1000 | 123.3 |
| 180 | 7 | 3600 | 4543.8 |
| 570 | 5 | 990 | 657.4 |
+----------+-------+------------+--------+
I'd like to create a new column called "type". The value of this column is based on these multiple conditions
- Class = 7 : QWE
- Class = 8 : ASD
- Class = 1 or 2 : ZXC
- Class = 4, 5, or 6 AND Industry = 170-490 or 570-690 AND Occupation >=1000 : IOP
- Class = 4, 5, or 6 AND Industry = 170-490 or 570-690 AND Occupation between 10-3540 : JKL
- Anything else : BNM
The resulting table will look like this:
+----------+-------+------------+--------+------+
| industry | class | occupation | value | type |
+----------+-------+------------+--------+------+
| 170 | 4 | 1000 | 123.3 | IOP |
| 180 | 7 | 3600 | 4543.8 | QWE |
| 570 | 5 | 990 | 657.4 | JKL |
+----------+-------+------------+--------+------+
My first approach to this was basically creating multiple dataframes of each type using the dataframe query method. However, I found out about the numpy "where" method and I am currently using a nested version of that to create the "type" column in one step. However, I feel like it's un-readable and I can imagine a situation where there are even more conditions that would make this process look really messy. Is there a cleaner way of doing this? Maybe with a dictionary or something?