0

I trying to do a match with the following 2 dataframe:

df_co:

Cntr No        Labour   Material    Amount  
BHCU 2604370    0.0      82.5       82.5    
BHCU 2604370    24.0     22.0       46.0    

df:

Cntr No        Total    
BHCU 2604370    82.0    
BHCU 2604370    46.0

code:

    df['Tally'] = ((df_co['Cntr No'].isin(df['Cntr No'])) & 
                  ((df_co['Labour'].isin(df['Total'])) | 
                  (df_co['Material'].isin(df['Total'])) |
                  (df_co['Amount'].isin(df['Total'])))).map({True:'Yes',False:'No'})

It should not give me a match as df_co 'Amount' is 82.5 and df 'Total' amount is 82.00.

But my result give me both matched.

Result:

Cntr No        Total     Tally
BHCU 2604370    82.0      Yes
BHCU 2604370    46.0      Yes

Suspect the code comparison is wrong.

convert to float:

 a = df.iloc[:, :5]
 b = df.iloc[:,5:29].apply(lambda x : 
 x.str.extract('(\d+)',expand=False).astype(float))
 c = df.iloc[:, 29:]
 df = pd.concat([a,b,c], axis=1)
okl
  • 317
  • 3
  • 12

1 Answers1

1

Use:

df = df.merge(df_co.melt('Cntr No', value_name='Total', var_name='Tally'), on=['Cntr No', 'Total'], how='left')
df['Tally'] = df['Tally'].notnull().map({True:'Yes',False:'No'})
print (df)
        Cntr No  Total Tally
0  BHCU 2604370   82.0    No
1  BHCU 2604370   46.0   Yes
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • further checking it seem the number is originally 82.50 ,it only changed to 82.0 after converted to float type. pls see my post for the updated code. – okl May 02 '18 at 12:39
  • @okl - I think need `x.str.extract('(\d+\.\d+)'` - from [this](https://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string) – jezrael May 02 '18 at 12:48
  • @okl - is possible share data if not confidental? – jezrael May 02 '18 at 12:51
  • oh my mistake, the value is still there, only 0 changed to nan. is it possible to change nan back to 0? i tried df = df.fillna('0') but for error ...AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', 'occurred at index SERVICE INT / EX WASH40 FEET') – okl May 02 '18 at 12:56
  • maybe need `b = df.iloc[:,5:29].fillna('0').astype(str)` – jezrael May 02 '18 at 12:58
  • how about remove the column with value is nan ? df = df.loc[:, (df != 0).any(axis=0)] – okl May 02 '18 at 13:01
  • Do you think `df.loc[:, df.notnull().all()]` ? Tested with `df = pd.DataFrame({'A':list('abcdef'), 'B':[4,5,4,5,5,4], 'BirthRate':[np.nan,8,9,4,2,3], 'InternetUsers':[1,3,5,7,1,np.nan]})` – jezrael May 02 '18 at 13:03
  • doesnt seem work, just want to drop the column if all value = nan – okl May 02 '18 at 13:19
  • @okl - There is used `notnull` - so filter all columns if no nans there. Do you test my sample data from comment? – jezrael May 02 '18 at 13:21
  • your sample data : NameError: name 'np' is not defined – okl May 02 '18 at 13:23
  • use `import numpy as np` or change `np.nan` to `pd.np.nan` – jezrael May 02 '18 at 13:24
  • df = pd.DataFrame({'A':list('abcdef'), 'B':[4,5,4,5,5,4], 'BirthRate':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan], 'InternetUsers':[1,3,5,7,1,np.nan]}) what i need to remove is the BirthRate column since all value are NaN. Internetusers to keep because not all value are NaN – okl May 02 '18 at 13:27
  • @okl - Nice sample, need `df = df.loc[:, ~df.isnull().all()]` – jezrael May 02 '18 at 13:29
  • Bingo! Thanks a lot for the help! – okl May 02 '18 at 13:32