1

I have two data frames F1 and F2 containing both the column id1, id2.

F1 contains two columns F1[id1,id2].
F2 contains three column [id1,id2,Description] I wante to test if F2['id1']exists in F1['id1'] OR F2['id2']exists inF1['id2'] then i must addd a colmun in F1 with Description of this id1 or id2 in F2` . The contens of F1 and F2 are are HERE. The Output That im attending on F1 is also HERE I created F1 and F2 like This

     F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN','223','788','NaN']}
     F1 = pd.DataFrame(data=F1)
     F2 = {'id1': ['x22', 'NaN','NaN','x413','x421'],'id2':['NaN','223','788','NaN','233'],'Description':['California','LA','NY','Havnover','Munich']}
     F2 = pd.DataFrame(data=F2)

Actually i tried several solutions . But nothing helps me to do it . Help please

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
anayisse2
  • 103
  • 1
  • 7
  • Do you need `df = F1.merge(F2, how='left')` ? Or `df = F1.merge(F2, how='left', on=['id1','id2'])` ? – jezrael Mar 30 '19 at 22:01
  • i want from the two dataframe F1 and F2. Add a column in F1 Filled Like shown in the picture shown in the post. – anayisse2 Mar 30 '19 at 22:05
  • @jezrael it is not my case!! – anayisse2 Mar 30 '19 at 23:41
  • Can you create `F3` with expected output? – jezrael Mar 30 '19 at 23:43
  • The output is in the File F1 `F3 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN','223','788','NaN'],'Description': ['California','LA','NY','Munich']} F3 = pd.DataFrame(data=F3)` – anayisse2 Mar 30 '19 at 23:56
  • the output will be in the File F1 . With adding a new column of description – anayisse2 Mar 30 '19 at 23:58
  • Thank you for your response. But the solution is in F1 . F1 have only 4 lines . To explain more first we begin with the first line in F1 . If it has id1 not null or id2 notnull, x22 exist in F1 so we have to put in the first line of the added line california. In second line we do the samething with searching 223 in F1 and we find it so 2nd will takes LA . Next we will search for 788 that exists so we print 'NY' in the related line. The last line corresponds to id2=421 we search it and we write Munich in the last line .. – anayisse2 Mar 31 '19 at 00:14
  • Lets Know that F1 can have more then 4 lines .. In this case the value will be Null cause don"t have an id in F2. Hope that i clear – anayisse2 Mar 31 '19 at 00:14
  • I got solution, working on answer. – jezrael Mar 31 '19 at 00:14
  • 1
    do you see what i mean.? – anayisse2 Mar 31 '19 at 00:15

1 Answers1

0

Use:

#if necessary replace string NaN to missing values
F1 = F1.replace('NaN', np.nan)
F2 = F2.replace('NaN', np.nan)

For each column remove duplicates by DataFrame.drop_duplicates, missing valeus by DataFrame.dropna and create index by id column by DataFrame.set_index:

s1 = F2.drop_duplicates('id1').dropna(subset=['id1']).set_index('id1')['Description']
s2 = F2.drop_duplicates('id2').dropna(subset=['id2']).set_index('id2')['Description']

Then use Series.map by columns from F1 with Series.combine_first:

F1['Description'] = F1['id1'].map(s1).combine_first(F1['id2'].map(s2))
print (F1)
    id1  id2 Description
0   x22  NaN  California
1   x13  223          LA
2   NaN  788          NY
3  x421  NaN      Munich
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252