match() unable to find indexes of values from a dataframe in R

Question

In a function with three arguments (i.e. state, outcome and nums) and the data frame is called df. Here is the snippet with the issue:

    else if (num == "worst"){
    da <- as.numeric(unlist(df[,outcome]))
    da <- na.omit(da)
    da <- sort(da, decreasing = T)
    dm <- match(da, df[,outcome])
    print(unique(df[dm, "hospital"]))
  }

the values from da printed separately are as follows:

[1] 19.0 18.4 17.6 17.3 17.3 17.1 17.1 16.8 16.8 16.7 16.6 16.5 16.5 16.4 16.3 16.3 16.3 16.2 16.2 16.2 16.2 16.2 16.2
[24] 16.2 16.2 16.0 16.0 15.9 15.8 15.8 15.8 15.8 15.8 15.7 15.6 15.6 15.6 15.5 15.4 15.4 15.4 15.3 15.3 15.3 15.3 15.2
[47] 15.2 15.2 15.1 15.1 15.0 15.0 15.0 14.9 14.9 14.9 14.9 14.8 14.7 14.7 14.7 14.7 14.7 14.6 14.5 14.5 14.3 14.3 14.3
[70] 14.2 14.2 14.1 14.1 14.1 14.0 14.0 13.8 13.8 13.6 13.5 13.5 13.3 13.2

these values are directly derived from the original df data frame.

when trying to match the values to extract the indexes using dm variable, here is what is returned:

[1] NA 41 88 14 14 52 52 45 45 58 34 26 26 16  8  8  8  3  3  3  3  3  3  3  3 NA NA 12 36 36 36 36 36 33  9  9  9 60 13
[40] 13 13 53 53 53 53 44 44 44  4  4 NA NA NA  5  5  5  5 85  7  7  7  7  7 89  2  2 38 38 38 18 18 30 30 30 NA NA 71 71
[79] 48 23 23 86  1

the issue: for example, the first value of 19.0 from da is not recognized in dm, even though that value is derived from the same database of df. Can someone explain me what the issue is and how i can identify the missing variables to retrieve the index values to print specific rows based on the index values.

NOTE I have already tried to switch to which(x %in% y) method:

      da <- sort(da, decreasing = T)
      dm <- which(df[,outcome] %in% da)

but that just gives this as an output:

[1]   1   2   3   4   5   7   8   9  10  12  13  14  15  16  17  18  20  22  23  24  25  26  27  30  31  32  33  34  35
[30]  36  37  38  39  40  41  43  44  45  46  47  48  49  52  53  55  56  57  58  59  60  61  62  63  64  65  66  68  70
[59]  71  72  73  74  75  77  79  81  83  84  85  86  88  89  91 105 111

Thanks in advance for your help.

dput link https://pastelink.net/2e8tm

Almost certainly this is a floating point accuracy issue. Equality testing on non-integers is problematic. See the FAQ [Why are these numbers not equal?](https://stackoverflow.com/q/9508518/903061) for discussion and work-arounds. — Gregor Thomas, Dec 18 '20 at 14:18
However, it's also possible you've got weird class things going on... it's not clear why your code needs `as.numeric()` and `unlist()` for to make `da`, but not when matching back to `df[,outcome]`. If you share a small reproducible example using `dput()` to share data (which preserves class and structure information), we can get a better idea if that is an issue or not. — Gregor Thomas, Dec 18 '20 at 14:20
The ```as.numeric()``` and ```unlist()``` were used to make the NA values removable, there might be a better way to handle this issues which if you can elaborate on (if there is one and thank you if you do). But currently, with adding the aforementioned two functions to ```df[,outcome]``` , the issue at hand has been solved. Thanks for that. — beansbeans, Dec 18 '20 at 14:34
I'd be happy to advise on removing NA values without `as.numeric` and `unlist`, but you'd need to share some raw input so we can understand why you think they're needed in the first place... if `outcome` is a numeric column you should be able to skip those steps entirely. If it's not a numeric column, I don't know what it is. — Gregor Thomas, Dec 18 '20 at 14:40
the input is a csv file of hospitals and their outcomes based on three separate medical issues ( i.e. heart failure, heart attack and pneumonia) and the function is supposed to create a rank the hospitals based on the mortality rate of a specific medical issues in a specific state. Outcome is not a numeric column. Further, the ```read.csv()``` function reads the file in such a way that the values are not numeric in nature, therefore I had to manually make them numeric. Is that what was meant by input? in case you want the full code, here it is: https://pastelink.net/2e8p0 — beansbeans, Dec 18 '20 at 14:56
What would help is `dput(df[1:10, ])`, however glancing at your code you are using `as.data.frame(cbind(...))` which is an anti-pattern and might be causing your numeric issues. `cbind()` creates a `matrix` if you don't start with a data frame already, which if there are any non-numeric columns will turn all columns non-numeric. If you add the full code and `dput` to your question, I can take a look. — Gregor Thomas, Dec 18 '20 at 15:07
Here is the ```dput()``` of the fixed function https://pastelink.net/2e8ta — beansbeans, Dec 18 '20 at 15:18

match() unable to find indexes of values from a dataframe in R

0 Answers0