How to create a pandas Series (column), based in a match with a value in another Dataframe?

Question

my question is the following: I do not know very well all the pandas methods and I think that there is surely a more efficient way to do this: I have to load two tables from .csv files to a postgres database; These tables are related to each other with an id, which serves as a foreign key, and comes from the source data, however I must relate them to a different id controlled by my logic.

I explain graphically in the following image:

Im trying to create a new Series based on the "another_id" that i have and apply a function that loop through a dataframe Series to compare if have the another code and get their id

def check_foreign_key(id, df_ppal):
  if id:
    for i in df_ppal.index:
      if id == df_ppal.iloc[i]['another_id']:
        return df_ppal.iloc[i]['id']

dfs['id_fk'] = dfs['another_id'].apply(lambda id : check_foreign_key(id, df_ppal))

In this point i think that it is not efficient because I have to loop in all column to match the another_id and get and get its the correct id that I need is in yellow in the picture.

So I should think about search algorithms to make the task more efficient, but I wonder if pandas does not have a method that allows me to do this faster, in case there are many records.

I need a dataframe like a this table that have a new column "ID Principal" based on matching Another_code, with another dataframe column.

ID	ID Principal	Another_code
1	12	54
2	12	54
3	13	55
4	14	56
5	14	56
6	14	56

I'm not quite following the goal. But, in function check_foreign_key: 1) you're using df_exp rather than df_ppal that was passed in, and 2) cod_exp is not defined. — DarrylG, Oct 01 '21 at 06:37
Please can you edit the question to show a sample of the output table you desire? In any event, I think your solution might eventually involve `pd.DataFrame.join()`. — TMBailey, Oct 01 '21 at 06:39
Checkout [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101). It explains how to merge dataframes (with join being a special case of merge) (merge is very fast). — DarrylG, Oct 01 '21 at 06:46
Sure, I edited the question, I also corrected my function, I put the variables correctly, I added a table with the output I need, I think you can have more context, let me know if it was clearer. — Camilo Bayona, Oct 01 '21 at 07:05
See [`pd.merge`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) and [Database-style DataFrame or named Series joining/merging](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging). — Timus, Oct 01 '21 at 08:41
@CamiloBayona--just checking on whether you were going to understand how merge applied to your problem. — DarrylG, Oct 01 '21 at 20:07

score 0 · Answer 1 · answered Oct 04 '21 at 14:01

Well indeed, I was not understanding very well all the pandas functions, I could solve my problem using merge, I did not know that pandas had a good implementation of the typical Join in SQL.

This documentation helped me a lot:

Finally my answer:

new_df = principal.merge(secondary, on='another_id')

I thank you all!

How to create a pandas Series (column), based in a match with a value in another Dataframe?

1 Answers1