0

I'm having trouble copying some data between 2 dataframes. I have the main_df:

main_df = pd.DataFrame({"id": [123, 456, 789, 357, 159], "date": [None, "2022-10-10", "2022-09-15", None, "2022-09-15"], "stuff": [3, 6, 2, 9, 3]})


id             date  stuff 
123             NaN      3
456      2022-10-10      6
789      2022-09-15      2
357             NaN      9
159      2022-09-15      3

and second_df:

second_df = pd.DataFrame({"id": [321, 456, 789, 789, 351], "stuff": [3, 6, 2, 4]})


id   stuff 
321      3
456      6
789      2
351      4

I want to search if an id in second_df is in main_df and copy the date that appear in main_df to second_df. This would be the result:

id   stuff         date
321      3          NaN
456      6   2022-10-10
789      2   2022-09-15
351      4          NaN  

I know that with second_df["id"].isin(main_id["id"]) I can get a dataframe/column/Series with boolean results indicating if the id exists, but I don't know how to copy the date value.

Hope someone can help me, thanks.

Alex Turner
  • 698
  • 6
  • 16

1 Answers1

1

you can use map to bring over the date value

df2['date']=df2['id'].map(df.set_index('id')['date'])
df2
id  stuff   date
0   321     3   NaN
1   456     6   2022-10-10
2   789     2   2022-09-15
3   351     4   NaN
Naveed
  • 11,495
  • 2
  • 14
  • 21
  • Thanks I believed it work, I still have to test it well, then I let you know. Now I'm filtering the data with dropna in base of a column. The real data has more columns and more NaN /NaT. – Alex Turner Oct 24 '22 at 15:46
  • any idea how could I do this if I wanted to copy id, and stuff, from main_df to second_df? I was trying with: `df2['id2', 'stuff2']=df2['id'].map(df.set_index('id')['id', 'stuff'])` but it's not working. Thanks – Alex Turner Oct 24 '22 at 16:07
  • 1
    Map returns only a single value. if you need to multiple columns then pd.merge is an option – Naveed Oct 24 '22 at 16:12
  • Thanks, I'll check on that – Alex Turner Oct 24 '22 at 16:14
  • sorry to bother you again. Im reading about `pd.merge`, but I wanted to ask about the case if **df** had repeted ids, for example: `"id": [123, 456, 789, 357, 456]`. The code you send returns this error: `pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects` because the index is repeated. Hope you can help me, thanks. – Alex Turner Oct 24 '22 at 18:50
  • 1
    how you like to handle this duplicate situation? can these duplicate rows be removed? In case of merge with duplicate in DF, you end up with multiple rows in your second DF – Naveed Oct 24 '22 at 18:53
  • Mmmm right now I would say that I don't mind the duplicate rows, but that's because I know I can use `pd.duplicate` or `pd.drop_duplicate` to delete them. I'm not 100 % sure what case I need more. Thanks – Alex Turner Oct 24 '22 at 19:15