check the existence of value pandas

Question

have two data frames F1 and F2 containing both the column id1, id2.

F1 contains 5 columns. F2 contains three column [id1,id2,Description] I want to test if F1['id1']exists in F2['id1'] OR F1['id2']exists inF2['id2'] then i must addd a colmun in F1 with Description of this id1 or id2 in F2` . The contens of F1 and F2 are are HERE. The Output That im attending on F1 is also HERE I created F1 and F2 like This

 F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN',223,788,'NaN']}
 F1 = pd.DataFrame(data=F1)
 F2 = {'id1': ['x22', 'NaN','NaN','x413','x421'],'id2':['NaN','223','788','NaN','233'],'Description':['California','LA','NY','Havnover','Munich']}
 F2 = pd.DataFrame(data=F2)

This is what i did:

s1 = F2.drop_duplicates('id1').dropna(subset=['id1']).set_index('id1')['Description']
s2 = F2.drop_duplicates('id2').dropna(subset=['id2']).set_index('id2')['Description']
F1['Description'] = F1['id1'].map(s1).combine_first(F1['id2'].map(s2))

How can i correct my code to get This result

Result of F1:

  F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN',223,788,'NaN'],'Name':['NNNN','AAAA','XXXX','OOO'],'V1':['oo','li','la','lo'],'Description':['Clafiornia','LA','NY','Munich']}
  F1 = pd.DataFrame(data=F1)

What exactly do you mean by "testing if F2['id1']exists in F1['id1']" etc. ? Check if the columns are identical? Contain some common elements? If one is a subset of the other? It is not clear... — noamgot, Jun 06 '19 at 12:54
the result is shown by the figure above. I added the result @noamgot — Miss, Jun 06 '19 at 12:56
Sorry, I tried to understand, but it is not clear what you are trying to do. It seems that either the example data or your output must be wrong. Please provide a [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and I will try to help. — pnovotnyq, Jun 06 '19 at 13:53

vlemaistre · Answer 1 · 2019-06-06T13:40:49.157

1

You can use the isin() function to check wether the ids are in both df's :

F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN', 223, 788,'NaN']}
F1['id2'] = [str(x) if ~isinstance(x, str) else x for x in F1['id2']]
F1 = pd.DataFrame(data=F1)
F2 = {'id1': ['x22', 'NaN','NaN','x413','x421'],'id2':['NaN','223','788','NaN','233'],'Description':['California','LA','NY','Havnover','Munich']}
F2 = pd.DataFrame(data=F2)
F1['Description'] = ''

F1['Description'] = ''

id1_F1 = (F1[F1['id1']!='NaN']['id1'].isin(F2['id1']))
id1_F2 = (F2[F2['id1']!='NaN']['id1'].isin(F1['id1']))
id2_F1 = (F1[F1['id2']!='NaN']['id2'].isin(F2['id2']))
id2_F2 = (F2[F2['id2']!='NaN']['id2'].isin(F1['id2']))


F1.loc[id1_F1[id1_F1].index.values, 'Description'] = F2.loc[id1_F2[id1_F2].index.values, 'Description'].values
F1.loc[id2_F1[id2_F1].index.values, 'Description'] = F2.loc[id2_F2[id2_F2].index.values, 'Description'].values

Output :

id1 id2 Description
0   x22 NaN California
1   x13 223 LA
2   NaN 788 NY
3   x421    NaN Munich

edited Jun 06 '19 at 13:40

answered Jun 06 '19 at 13:02

vlemaistre

3,301
13
30

THere is NAn Second row – Miss Jun 06 '19 at 13:06
You're right @jezrael, I had forgotten to check the Nan's, I edited it. Thanks :) – vlemaistre Jun 06 '19 at 13:31
Yes but isn't it a typo since all the others are in strings ? – vlemaistre Jun 06 '19 at 13:33
1

@vlemaistre - hmmm, try change it and then solution of OP working nice. So no typo. My all answer is about it. – jezrael Jun 06 '19 at 13:38

jezrael · Accepted Answer · 2019-06-06T13:17:08.103

Solution working nice, but problem in data - first NaN are not missing values but strings, so necessary replace and then second problem in F2['id2'] is numeric values are strings repr of numbers, so added to_numeric with errors='coerce':

F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN',223,788,'NaN']}
F1 = pd.DataFrame(data=F1)
F2 = {'id1': ['x22', 'NaN','NaN','x413','x421'],'id2':['NaN','223','788','NaN','233'],
      'Description':['California','LA','NY','Havnover','Munich']}
F2 = pd.DataFrame(data=F2)

#solution for sample data
F1 = F1.replace('NaN', np.nan)
F2 = F2.replace('NaN', np.nan)
F1['id2'] = pd.to_numeric(F1['id2'], errors='coerce').fillna(F1['id2'])
F2['id2'] = pd.to_numeric(F2['id2'], errors='coerce').fillna(F2['id2'])

General solution for replace only id columns in both DataFrames with convert values to numeric in both DataFrames for both columns:

cols = ['id1','id2']
F1[cols] = F1[cols].replace('NaN', np.nan)
F1[cols] = F1[cols].apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(F1[cols])
F2[cols] = F2[cols].replace('NaN', np.nan)
F2[cols] = F2[cols].apply(lambda x: pd.to_numeric(x, errors='coerce')).fillna(F2[cols])

Another solution with custom function:

def func(x):
    try:
        return float(x)
    except Exception:
        return x

cols = ['id1','id2']
F1[cols] = F1[cols].applymap(func)
F2[cols] = F2[cols].applymap(func)

print (F1)
    id1    id2
0   x22    NaN
1   x13  223.0
2   NaN  788.0
3  x421    NaN

print (F2)
    id1    id2 Description
0   x22    NaN  California
1   NaN  223.0          LA
2   NaN  788.0          NY
3  x413    NaN    Havnover
4  x421  233.0      Munich

s1 = F2.drop_duplicates('id1').dropna(subset=['id1']).set_index('id1')['Description']
s2 = F2.drop_duplicates('id2').dropna(subset=['id2']).set_index('id2')['Description']

F1['Description1'] = F1['id1'].map(s1).combine_first(F1['id2'].map(s2))
print (F1)
    id1    id2 Description1
0   x22    NaN   California
1   x13  223.0           LA
2   NaN  788.0           NY
3  x421    NaN       Munich

@Miss - super, then only remove `F1 = F1.replace('NaN', np.nan) F2 = F2.replace('NaN', np.nan)` — jezrael, Jun 06 '19 at 13:22
But NaN i wrote it to explain my solutionn essentially we have empty field — Miss, Jun 06 '19 at 13:23
@Miss - yes, then change `NaN` to `np.nan` in sample data for missing values. But question is now still not working? — jezrael, Jun 06 '19 at 13:24
@Miss - hmmm, for me it working nice, so I guess some data related problem here. — jezrael, Jun 06 '19 at 13:27
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/194546/discussion-between-jezrael-and-miss). — jezrael, Jun 06 '19 at 13:29

check the existence of value pandas

2 Answers2