How to merge a column from a dataframe to another based on a condition?

Question

I have the following CSV files:

file1.csv #dataframe is named dfFile1
Id,name,pos_neg,line
1,abc,pos,cas
2,cde,neg,work
3,efg,pos,cat
4,abc,pos,job

file2.csv #dataframe is named dfFile2
Id,ref,names,other
c10,n1,www,10.5
c11,m4,efg,5.4
c12,m5,cde,9.8
c13,m9,hhh,6.7
c14,n4,abc,12.5
c15,n9,kkk,3.4

which I converted into dataframes using pandas. I would like to obtain a third data frame that matches the rows of dfFile2 according to the unique values presented in the name field of dfFile1, and also add the pos_neg row from file 1, so I will end up with:

dfNew
Id,ref,names,other,pos_neg
c11,m4,efg,5.4,pos
c12,m5,cde,9.8,neg
c14,n4,abc,12.5,pos

So far, I have done the following:

list=[]
list=dfFile1["name"].unique()    #contains [abc,cde,efg]
dfFile2=dfFile2[dfFile2.names.isin(list)]

However, I just do not know how can I merge the column pos_neg from dfFile1, I tried the following:

dfNew=dfFile2.merge(dfFile2,dfFil1[["pos_neg"]],on=dfFile2)

Unfortunately, it does not work.

Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — Umar.H, Jun 27 '20 at 20:46
Please provide a [mcve], as well as the current and expected output. — AMC, Jun 27 '20 at 20:59

score 1 · Answer 1 · answered Jun 27 '20 at 20:49

1

You were almost there, just some tweaking with the DataFrame.merge method, furthermore you need drop_duplicates here, since abc appears twice in the dfFile1.

dfNew = (
    dfFile2.merge(dfFile1[['name', 'pos_neg']], 
                  left_on='names', 
                  right_on='name')
    .drop_duplicates()
    .drop(columns='name')
)

    Id ref names  other pos_neg
0  c11  m4   efg    5.4     pos
1  c12  m5   cde    9.8     neg
2  c14  n4   abc   12.5     pos

Sidenote: in Python we don't use camelCase for variable names but lowercase with underscore camel_case. See PEP8 style guide:

Function names should be lowercase, with words separated by underscores as necessary to improve readability.

answered Jun 27 '20 at 20:49

Erfan

40,971
8
66
78

the problem is that I need to validate that the names that appear on File1 also appear on File2, for that reason I need the unique() instruction – Little Jun 27 '20 at 20:52
Can you edit your expected output, else add more details to your input, because right now my output matches yours. – Erfan Jun 27 '20 at 20:53
thank you so much for your help, I found that in my original dataset file 1 can have duplicates like 1,abc,pos,cas and also 10,abc,neg,cow; and for that reason the merge was giving me a strange result based on your answer. This was happening because for a same name it could have two different values of pos and neg based on other factors.I will try to fix that case of file 1 first. – Little Jun 27 '20 at 22:32

score 1 · Answer 2 · answered Jun 27 '20 at 21:42

1

You can iterate through your dataframe with iterrows

df3 = df2[df2.names.isin(names)]

for index, row in df3.iterrows():
    row = df[row['names'] == df['name']]['pos_neg']
    df3.loc[index,'pos_neg'] = row.iloc[0]

row.loc[0] stands for rows that has same 'name' field. Gets first of same named rows

answered Jun 27 '20 at 21:42

Stannum

36
4

Thanks for welcoming :) – Stannum Jun 27 '20 at 23:05

score 0 · Answer 3 · answered Jun 27 '20 at 20:41

0

Try:

dfNew = dfFile2.merge(dfFile1[["name", "pos_neg"]], how="inner", left_on="names", right_on="name")

Rearranging the columns and/or renaming them shouldn't be difficult if above works.

answered Jun 27 '20 at 20:41

WGS

13,969
4
48
51

Thanks @NullDev, but I need to preserve the unique condition because on File1 there are repeated values on the name field – Little Jun 27 '20 at 20:49

How to merge a column from a dataframe to another based on a condition?

3 Answers3