I created an informational table, which is very long (think nearly 100 observations or just under). I have a main table (around 70K onservations) where I need to create a new column and fill it based on matching values between my two dataframes but I need to fill the new column the cells with data from my information table.
I created a small dataset but my real dataset (which I can't share because my prof signed a non-diclosure) has like 70K observations.
data_1 (info table)
Animal Food
Dog Stake
Cat Fish
Rabbit Carrot
data 2 (original table)
Name Animal Age
Binxy Dog 1
Al Rabbit 4
Sam Dog 11
Dexter Cat 9
Dory Hamster 6
Chloe Cat 5
desired data_frame
Name Animal Age Fed [new column]
Binxy Dog 1 Stake
Al Rabbit 4 Carrot
Sam Dog 11 Stake
Dexter Cat 9 Fish
Dory Hamster 6 NaN
Chloe Cat 5 Fish
my gut tells me it probably has to do with iloc
or loc
and using some bloonean values. Verbally I would say:
- If data_2["Animal"] == data_2["Animal"]
- Then fill new column data_2["Fed"] with corresponding food found in data_1["Food"]
I think merge
might work but I'm not certain if it would fill it for every matching value. I'm not very good at merges because I struggle to understand the join feature but I don't think values would be inserted where I need them because my dataframes are not the same length.
edit: I've don't manually before but I had like only two or three values in fill in and I don't want to manually do this for 100. But this is as far as my basic understanding goes.
New_Categorized_Full.loc[
(New_Categorized["Produce"] == "Apple"), "Fruit"] = "Fuji"