I need a code that does 2 functions. Update I added something to the requirements that I forgot
1, if a substring if found on a list, it should tag it with a name, for example, if "0325" is found then in another column, it should add "animal_customer" and if no match is found, it should say "unknown" in a column that in this example is named, 'specie'
2, if the substring is found, then, on another column that has a long string, it should extract 1 substring (space separated) to the left and 3 substrings (also space separated) to the right. After that if found and added to another column, the column (in this case named story) is not necesary. Note that if there was no match in step 1, then this column should say "I.D. not found"
3, There is a chance that more than 1 tag will occur, I need all those that show up as lists in the column
Here is an example of the lists and how the inital table and result table should look
animal = ["0325", "9985"]
human = ["9984", "1859"]
Original Table ->
name | species_id | story |
---|---|---|
Bob | 010199840101 | based on research, human is from U.S. or nearby |
Fido | 010199850101 | based on research, animal is from taiwan or nearby |
E.T. | 010145660101 | based on research, E.T. is from mars or nearby |
ManBearPig | 03259984010101 | based on research, human is from mars or nearby and animal is probably alien too |
Resulting table ->
name | species_id | specie | origin_list | extract |
---|---|---|---|---|
Bob | 010199840101 | human_customer | human | research, human is from U.S. |
Fido | 010199850101 | animal_customer | animal | research, animal is from taiwan |
E.T. | 010145660101 | unknown | none | I.D. Not found |
manbearpig | 03259984010101 | "human_customer", "animal_customer" | "animal", "human" | "research, human is from mars", "and animal is probably alien" |
My attempt to fix this:
animal_df = df[df["species_id"].str.contains('|'.join(map(re.escape, animal)))]
animal_df["specie"] = "animal_customer"
human_df = df[df["species_id"].str.contains('|'.join(map(re.escape, human)))]
human_df["specie"] = "human_customer"
df_append = pd.concat(["human_df", "animal_df"])
Problem:
As you can see my attempt will identify and tag the row as if it contains an animal or a human, but wont show up the error if it does not match anything and also will add duplicates.