0

my question is the following: I do not know very well all the pandas methods and I think that there is surely a more efficient way to do this: I have to load two tables from .csv files to a postgres database; These tables are related to each other with an id, which serves as a foreign key, and comes from the source data, however I must relate them to a different id controlled by my logic.

I explain graphically in the following image:

enter image description here

Im trying to create a new Series based on the "another_id" that i have and apply a function that loop through a dataframe Series to compare if have the another code and get their id

def check_foreign_key(id, df_ppal):
  if id:
    for i in df_ppal.index:
      if id == df_ppal.iloc[i]['another_id']:
        return df_ppal.iloc[i]['id']

dfs['id_fk'] = dfs['another_id'].apply(lambda id : check_foreign_key(id, df_ppal))

In this point i think that it is not efficient because I have to loop in all column to match the another_id and get and get its the correct id that I need is in yellow in the picture.

So I should think about search algorithms to make the task more efficient, but I wonder if pandas does not have a method that allows me to do this faster, in case there are many records.

I need a dataframe like a this table that have a new column "ID Principal" based on matching Another_code, with another dataframe column.

ID ID Principal Another_code
1 12 54
2 12 54
3 13 55
4 14 56
5 14 56
6 14 56
  • I'm not quite following the goal. But, in function check_foreign_key: 1) you're using df_exp rather than df_ppal that was passed in, and 2) cod_exp is not defined. – DarrylG Oct 01 '21 at 06:37
  • Please can you edit the question to show a sample of the output table you desire? In any event, I think your solution might eventually involve `pd.DataFrame.join()`. – TMBailey Oct 01 '21 at 06:39
  • 1
    Checkout [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101). It explains how to merge dataframes (with join being a special case of merge) (merge is very fast). – DarrylG Oct 01 '21 at 06:46
  • Sure, I edited the question, I also corrected my function, I put the variables correctly, I added a table with the output I need, I think you can have more context, let me know if it was clearer. – Camilo Bayona Oct 01 '21 at 07:05
  • See [`pd.merge`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) and [Database-style DataFrame or named Series joining/merging](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging). – Timus Oct 01 '21 at 08:41
  • @CamiloBayona--just checking on whether you were going to understand how merge applied to your problem. – DarrylG Oct 01 '21 at 20:07

1 Answers1

0

Well indeed, I was not understanding very well all the pandas functions, I could solve my problem using merge, I did not know that pandas had a good implementation of the typical Join in SQL.

This documentation helped me a lot:

  1. https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging

  2. Pandas Merging 101

Finally my answer:

new_df = principal.merge(secondary, on='another_id')

I thank you all!