0

I'm trying to create a function that checks over specific values within my large dataframe. The dataset is imported from SAS into R which seems to create some unique NA values. My solution is to use grep to find these strange values to fix them manually. I want to create a function that simply outputs the values so I can fix them. The function is as follows:

not.numeric <- function(mydata,column) {unique(mydata[grep('[0-9]',mydata$column,invert=TRUE),
                                                      c('SeriesID','Reference',column)])}

Easy enough, it searches for non numeric values within the column. For it to properly work, the value for column is surrounded in "". However, when I try to run the function:

not.numeric(df,'Depth')

No values come up and I get a warning message: Unknown or uninitialised column: 'column`. But when I just substitute the values in like this:

unique(df[grep('[0-9]',df$'Depth',invert=TRUE),c('SeriesID','Reference','Depth')])

It works just fine. I want to create a function because I have to check this often and there are 100+ columns to check. Any suggestions why the function doesn't work but the code does?

M Doster
  • 37
  • 5
  • 1
    `$` doesn't work with column names in strings, use `[[`, that is `mydata[[column]]`. – Gregor Thomas Oct 23 '20 at 16:48
  • 1
    `unique(mydata[grep('[0-9]', mydata[[column]], invert=TRUE), c('SeriesID','Reference',column)])`. Find `mydata$column` replace with `mydata[[column]]` – Gregor Thomas Oct 23 '20 at 16:49
  • Thank you that does fix the function's problem. Still seems weird to me that it works outside of being a function but I don't care as long as it works. Thank you! – M Doster Oct 23 '20 at 16:50
  • Read the dupe for details - but it boils down to `$` being special. Though it doesn't matter whether it's in a function or not. `df$Depth` works, `df"Depth"` works (surprisingly to me), but with `column <- "Depth"`, `df$column` does not work - you need `df[[column ]]` or `df[, column]`. It just happens that function arguments essentially use the `column <- "Depth"` approach. – Gregor Thomas Oct 23 '20 at 16:54
  • I've used the df[,'column'] approach before but for this particular function that wouldn't work because I needed the values not the column for this to work. I'm mad that when I was taking my R class the use of $, [], and [[]] was discussed quickly and not thoroughly which has hurt me many times. Thank you for your answer Gregor, I believe I have better understanding of these for subsetting uses – M Doster Oct 23 '20 at 17:02
  • Yeah, there's another twist which is the difference between `df[[col]]` and `df[, col]`. For regular data frames, they're the same if `col` has length 1 - they both return vectors. But if `col` has length > 1 then `[[` will error and `[` will return a data frame. In an effort to be more consistent `tibble` changed that so that `[` always returns a data frame, regardless of whether it is one column or more. `[[`, like `$`, always returns a vector in all cases, which I why I recommended that as the solution here. – Gregor Thomas Oct 23 '20 at 17:28

0 Answers0