0

have two data frames F1 and F2 containing both the column id1, id2.

F1 contains 5 columns. F2 contains three column [id1,id2,Description] I want to test if F1['id1']exists in F2['id1'] OR F1['id2']exists inF2['id2'] then i must addd a colmun in F1 with Description of this id1 or id2 in F2` . The contens of F1 and F2 are are HERE. The Output That im attending on F1 is also HERE I created F1 and F2 like This enter image description here

 F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN',223,788,'NaN']}
 F1 = pd.DataFrame(data=F1)
 F2 = {'id1': ['x22', 'NaN','NaN','x413','x421'],'id2':['NaN','223','788','NaN','233'],'Description':['California','LA','NY','Havnover','Munich']}
 F2 = pd.DataFrame(data=F2)

This is what i did:

s1 = F2.drop_duplicates('id1').dropna(subset=['id1']).set_index('id1')['Description']
s2 = F2.drop_duplicates('id2').dropna(subset=['id2']).set_index('id2')['Description']
F1['Description'] = F1['id1'].map(s1).combine_first(F1['id2'].map(s2))

How can i correct my code to get This result

Result of F1:

  F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN',223,788,'NaN'],'Name':['NNNN','AAAA','XXXX','OOO'],'V1':['oo','li','la','lo'],'Description':['Clafiornia','LA','NY','Munich']}
  F1 = pd.DataFrame(data=F1)
Miss
  • 69
  • 1
  • 8
  • What exactly do you mean by "testing if F2['id1']exists in F1['id1']" etc. ? Check if the columns are identical? Contain some common elements? If one is a subset of the other? It is not clear... – noamgot Jun 06 '19 at 12:54
  • i added caption – Miss Jun 06 '19 at 12:56
  • Maybe a short example of the desired result might help – noamgot Jun 06 '19 at 12:56
  • the result is shown by the figure above. I added the result @noamgot – Miss Jun 06 '19 at 12:56
  • 1
    Sorry, I tried to understand, but it is not clear what you are trying to do. It seems that either the example data or your output must be wrong. Please provide a [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and I will try to help. – pnovotnyq Jun 06 '19 at 13:53

2 Answers2

1

You can use the isin() function to check wether the ids are in both df's :

F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN', 223, 788,'NaN']}
F1['id2'] = [str(x) if ~isinstance(x, str) else x for x in F1['id2']]
F1 = pd.DataFrame(data=F1)
F2 = {'id1': ['x22', 'NaN','NaN','x413','x421'],'id2':['NaN','223','788','NaN','233'],'Description':['California','LA','NY','Havnover','Munich']}
F2 = pd.DataFrame(data=F2)
F1['Description'] = ''

F1['Description'] = ''

id1_F1 = (F1[F1['id1']!='NaN']['id1'].isin(F2['id1']))
id1_F2 = (F2[F2['id1']!='NaN']['id1'].isin(F1['id1']))
id2_F1 = (F1[F1['id2']!='NaN']['id2'].isin(F2['id2']))
id2_F2 = (F2[F2['id2']!='NaN']['id2'].isin(F1['id2']))


F1.loc[id1_F1[id1_F1].index.values, 'Description'] = F2.loc[id1_F2[id1_F2].index.values, 'Description'].values
F1.loc[id2_F1[id2_F1].index.values, 'Description'] = F2.loc[id2_F2[id2_F2].index.values, 'Description'].values

Output :

id1 id2 Description
0   x22 NaN California
1   x13 223 LA
2   NaN 788 NY
3   x421    NaN Munich
vlemaistre
  • 3,301
  • 13
  • 30
0

Solution working nice, but problem in data - first NaN are not missing values but strings, so necessary replace and then second problem in F2['id2'] is numeric values are strings repr of numbers, so added to_numeric with errors='coerce':

F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN',223,788,'NaN']}
F1 = pd.DataFrame(data=F1)
F2 = {'id1': ['x22', 'NaN','NaN','x413','x421'],'id2':['NaN','223','788','NaN','233'],
      'Description':['California','LA','NY','Havnover','Munich']}
F2 = pd.DataFrame(data=F2)

#solution for sample data
F1 = F1.replace('NaN', np.nan)
F2 = F2.replace('NaN', np.nan)
F1['id2'] = pd.to_numeric(F1['id2'], errors='coerce').fillna(F1['id2'])
F2['id2'] = pd.to_numeric(F2['id2'], errors='coerce').fillna(F2['id2'])

General solution for replace only id columns in both DataFrames with convert values to numeric in both DataFrames for both columns:

cols = ['id1','id2']
F1[cols] = F1[cols].replace('NaN', np.nan)
F1[cols] = F1[cols].apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(F1[cols])
F2[cols] = F2[cols].replace('NaN', np.nan)
F2[cols] = F2[cols].apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(F2[cols])

Another solution with custom function:

def func(x):
    try:
        return float(x)
    except Exception:
        return x

cols = ['id1','id2']
F1[cols] = F1[cols].applymap(func)
F2[cols] = F2[cols].applymap(func)

print (F1)
    id1    id2
0   x22    NaN
1   x13  223.0
2   NaN  788.0
3  x421    NaN

print (F2)
    id1    id2 Description
0   x22    NaN  California
1   NaN  223.0          LA
2   NaN  788.0          NY
3  x413    NaN    Havnover
4  x421  233.0      Munich

s1 = F2.drop_duplicates('id1').dropna(subset=['id1']).set_index('id1')['Description']
s2 = F2.drop_duplicates('id2').dropna(subset=['id2']).set_index('id2')['Description']

F1['Description1'] = F1['id1'].map(s1).combine_first(F1['id2'].map(s2))
print (F1)
    id1    id2 Description1
0   x22    NaN   California
1   x13  223.0           LA
2   NaN  788.0           NY
3  x421    NaN       Munich
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252