Compare the two data-frames columns on the basis of id's in the dataframe

Question

I am totally new to python . I have two data-frames which are of the same dataset but one is input and one is the output.

So, Here is my input dataframe

Document_ID OFFSET  PredictedFeature
    0         0            2000
    0         8            2000
    0         16           2200
    0         23           2200
    0         30           2200
    1          0            2100
    1          5            2100
    1          7            2100

SO Here I am giving this as an input to my ml-model. It gives me an output in the this format only .

Now my output looks like ,

  Document_ID    OFFSET   PredictedFeature
        0         0            2000
        0         8            2000
        0         16           2100
        0         23           2100
        0         30           2200
        1          0           2000
        1          5           2000
        1          7           2100

Now, In this two data-frames what I am trying to do is that

for that Id, for that OFFSET the input feature is same as that of output feature . if It is then I want to add true as a value in the new column if it is not then it will add false value.

Now, If we see in the example data

for ID 0 , for offset 16 the input feature is 2200 and output feature is 2100 so it is a false.

Can any one please help me with this ? Any thing will be helpful.

score 1 · Accepted Answer · answered Oct 16 '19 at 06:37

1

If there are same index values between both DataFrames and also same values in first 2 columns use:

inputdf['new'] = inputdf['PredictedFeature'] == outputdf['PredictedFeature']

answered Oct 16 '19 at 06:37

jezrael

822,522
95
1,334
1,252

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead .So I am getting this error – ganesh kaspate Oct 16 '19 at 06:40
@ganeshkaspate - Can you check [this](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) ? – jezrael Oct 16 '19 at 06:58
Is there any way though which I will come to know that if I have 50 records of 2100 in input csv then out of that if 25 has matched ? – ganesh kaspate Oct 16 '19 at 09:53
@ganeshkaspate - hmm, so solution failed? Or you ask for something else? – jezrael Oct 16 '19 at 11:39
could you please let me know is there any way ? – ganesh kaspate Oct 17 '19 at 06:29
@ganeshkaspate - Can you explain more? Maybe the best is create some sample data with expected ouput - [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – jezrael Oct 17 '19 at 06:31
Or Else I will put another question and will add it in comments – ganesh kaspate Oct 17 '19 at 06:33
Added this question please check https://stackoverflow.com/questions/58426376/get-the-count-of-matching-and-not-matching-columns-data-in-a-dataframe – ganesh kaspate Oct 17 '19 at 06:43

soundaraj · Answer 2 · 2019-10-16T06:51:47.803

0

concat

>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)

group by

 >>> df_gpby = df.groupby(list(df.columns))

get index of unique records

>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]

filter

>>> df.reindex(idx)
         Date   Fruit   Num   Color
9  2013-11-25  Orange   8.6  Orange
8  2013-11-25   Apple  22.1     Red

use this method you can find out the different datas by index value, you can add new column for this index value only false another values are true

edited Oct 16 '19 at 06:51

answered Oct 16 '19 at 06:40

soundaraj

1
4

Yes It says true . – ganesh kaspate Oct 16 '19 at 06:41
Sorry,I have done a bit wrong here when I did the compare as you gave the earlier solution it is saying false – ganesh kaspate Oct 16 '19 at 07:20

Compare the two data-frames columns on the basis of id's in the dataframe

2 Answers2