0

I have a dataframe with mutiple columns carrying float values.

df = pd.DataFrame({
        "v0": [0.493864,0.378362,0.342887,0.308959,0.746347], 
        "v1":[0.018915,0.018535,0.019587,0.035702,0.008325],
        "v2":[0.252000,0.066746,0.092421,0.036694,0.036506],
        "v3":[0.091409,0.103887,0.098669,0.112207,0.043911],
        "v4":[0.058429,0.312115,0.342887,0.305678,0.103065],
        "v5":[0.493864,0.378362,0.338524,0.304545,0.746347]})

I need to create another column result in df by comparing value of each row in df['v0'] with the value of rows in subsequent columns v1-v5.

What i need is as below: v0 v1 v2 v3 v4 v5 Result 0 0.493864 0.018915 0.252000 0.091409 0.058429 0.493864 1 1 0.378362 0.018535 0.066746 0.103887 0.312115 0.378362 1 2 0.342887 0.019587 0.092421 0.098669 0.342887 0.338524 1 3 0.308959 0.035702 0.036694 0.112207 0.305678 0.304545 0 4 0.746347 0.008325 0.036506 0.043911 0.103065 0.746347 1

I have tried many approaches including This link and This link

But it seems the task that I require is not doable. I have been struggling on this since last couple of days. The original dataset I have has more that 60000 rows. Please suggest the best and fastest way

vikrant rana
  • 4,509
  • 6
  • 32
  • 72
  • I just deleted my answer as was too similar to [this](https://stackoverflow.com/a/52393822/4819376). Anyway you should consider to change your example otherwise any solution from your first [link](https://stackoverflow.com/questions/52393659/pandas-dataframe-check-if-column-value-exists-in-a-group-of-columns) is a working one. – rpanai Dec 24 '18 at 11:17
  • 1
    @user32185 Not a good idea to suggest equality comparisons for floating point columns. – cs95 Dec 24 '18 at 15:13
  • @coldspeed I do agree. That's why I was suggesting we the OP to give an example where this is highlighted. – rpanai Dec 24 '18 at 17:06

2 Answers2

1

A better solution for dealing with floating point comparisons is to use np.isclose with broadcasting:

df['Result'] = np.isclose(v[:,1:], v[:,[0]]).any(1).astype(int)
df
         v0        v1        v2        v3        v4        v5  Result
0  0.493864  0.018915  0.252000  0.091409  0.058429  0.493864       1
1  0.378362  0.018535  0.066746  0.103887  0.312115  0.378362       1
2  0.342887  0.019587  0.092421  0.098669  0.342887  0.338524       1
3  0.308959  0.035702  0.036694  0.112207  0.305678  0.304545       0
4  0.746347  0.008325  0.036506  0.043911  0.103065  0.746347       1

Do NOT use equality based comparisons when dealing with floats because of the possibility of floating point inaccuracies. See Is floating point math broken?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • HI @coldspeed it worked. This activity that I was trying to do was a minor point in my overall project but I somehow got hooked to it. Having never done the float comaprison, it was really difficult for me to grasp. Moreover no solution was working. Thanks for the revert, little modificatioon and it worked – itsme 16may Dec 24 '18 at 18:07
0

For everyone the issue has been sorted. Thanks to all those wqho took time and replied to me. Being a amateur programmer, it was really heartening to see that my issue is getting answered. The final solution is arrived as below thanks to @coldspeed

JUPYTER NOTEBOOK SCREENSHOT