2

I am totally new to python . I have two data-frames which are of the same dataset but one is input and one is the output.

So, Here is my input dataframe

Document_ID OFFSET  PredictedFeature
    0         0            2000
    0         8            2000
    0         16           2200
    0         23           2200
    0         30           2200
    1          0            2100
    1          5            2100
    1          7            2100

SO Here I am giving this as an input to my ml-model. It gives me an output in the this format only .

Now my output looks like ,

  Document_ID    OFFSET   PredictedFeature
        0         0            2000
        0         8            2000
        0         16           2100
        0         23           2100
        0         30           2200
        1          0           2000
        1          5           2000
        1          7           2100

Now, In this two data-frames what I am trying to do is that

for that Id, for that OFFSET the input feature is same as that of output feature . if It is then I want to add true as a value in the new column if it is not then it will add false value.

Now, If we see in the example data

for ID 0 , for offset 16 the input feature is 2200 and output feature is 2100 so it is a false.

Can any one please help me with this ? Any thing will be helpful.

ganesh kaspate
  • 1
  • 9
  • 41
  • 88

2 Answers2

1

If there are same index values between both DataFrames and also same values in first 2 columns use:

inputdf['new'] = inputdf['PredictedFeature'] == outputdf['PredictedFeature']
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead .So I am getting this error – ganesh kaspate Oct 16 '19 at 06:40
  • @ganeshkaspate - Can you check [this](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) ? – jezrael Oct 16 '19 at 06:58
  • Is there any way though which I will come to know that if I have 50 records of 2100 in input csv then out of that if 25 has matched ? – ganesh kaspate Oct 16 '19 at 09:53
  • @ganeshkaspate - hmm, so solution failed? Or you ask for something else? – jezrael Oct 16 '19 at 11:39
  • could you please let me know is there any way ? – ganesh kaspate Oct 17 '19 at 06:29
  • @ganeshkaspate - Can you explain more? Maybe the best is create some sample data with expected ouput - [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – jezrael Oct 17 '19 at 06:31
  • Or Else I will put another question and will add it in comments – ganesh kaspate Oct 17 '19 at 06:33
  • Added this question please check https://stackoverflow.com/questions/58426376/get-the-count-of-matching-and-not-matching-columns-data-in-a-dataframe – ganesh kaspate Oct 17 '19 at 06:43
0

concat

>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)

group by

 >>> df_gpby = df.groupby(list(df.columns))

get index of unique records

>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]

filter

>>> df.reindex(idx)
         Date   Fruit   Num   Color
9  2013-11-25  Orange   8.6  Orange
8  2013-11-25   Apple  22.1     Red

use this method you can find out the different datas by index value, you can add new column for this index value only false another values are true

soundaraj
  • 1
  • 4