my question is the following: I do not know very well all the pandas methods and I think that there is surely a more efficient way to do this: I have to load two tables from .csv files to a postgres database; These tables are related to each other with an id, which serves as a foreign key, and comes from the source data, however I must relate them to a different id controlled by my logic.
I explain graphically in the following image:
Im trying to create a new Series based on the "another_id" that i have and apply a function that loop through a dataframe Series to compare if have the another code and get their id
def check_foreign_key(id, df_ppal):
if id:
for i in df_ppal.index:
if id == df_ppal.iloc[i]['another_id']:
return df_ppal.iloc[i]['id']
dfs['id_fk'] = dfs['another_id'].apply(lambda id : check_foreign_key(id, df_ppal))
In this point i think that it is not efficient because I have to loop in all column to match the another_id and get and get its the correct id that I need is in yellow in the picture.
So I should think about search algorithms to make the task more efficient, but I wonder if pandas does not have a method that allows me to do this faster, in case there are many records.
I need a dataframe like a this table that have a new column "ID Principal" based on matching Another_code, with another dataframe column.
ID | ID Principal | Another_code |
---|---|---|
1 | 12 | 54 |
2 | 12 | 54 |
3 | 13 | 55 |
4 | 14 | 56 |
5 | 14 | 56 |
6 | 14 | 56 |