I have a dataframe
Unnamed geo_zones_pickups Delstat region
4 4 ['52102' '53101'] ['52102' '53101'] [52102, 53101]
5 5 ['52102'] ['52102'] [52102]
and those there lists:
north = ['51101', '51105', '51100']
midt = ['52103', '52102', '52104', '52101', '52150']
south = ['53101', '53102', '53106', '53103', '53105', '53104']
I now need to create a new column 'geo_region' with string value taken from one of the list names variables. Desired output will look like this:
Unnamed geo_zones_pickups Delstat region geo_region
4 4 ['52102' '53101'] ['52102' '53101'] [52102, 53101] midt,south
5 5 ['52102'] ['52102'] [52102] midt
So basically, whenever any of the regions
is present in any of the declared list append the list name. All geographical regions have 3 first letter representing the appropriate geo_region. For north
they will always start with 511 and then followed by two other digits. For midt
they will always start with 521 and south
531. I tried several option the last one with intersection which works great but I can only match on one of the lists. How can I make it more robust to handle all three lists at the same time and assign a proper value to the new column? I greatly appreciate any help.
Edit:
geo_zones_pickups
and Delstat
are an object types with elements in it stored as string.region
also an object with elements in it stored as list of strings