0

I am running some checks on my data frame using the R validate package

I want to check if values in my column Protocol Number are actually numbers. When I write the expression using is.numeric I get the following output:

enter image description here

The documentation linked above says "We see that each rule checks a single item, namely one column of data" But this is not what I want. I don't want to check the entire column of data but each individual value.

As a result, when I call violating() I get an error to show rows that don't hold numbers

Error in violating(assaydat, out) : Not all rules have record-wise output

The column Protocol Number is actually a character vector, and some possible values (among many) are "1", "2", "3" or a comment like "Not Done" or "Pending". I want to flag each row in the data frame if there is a comment and not a number.

How do I do this correctly?

mandmeier
  • 355
  • 5
  • 16
  • It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick May 17 '22 at 21:35

1 Answers1

1

You can use:

grepl("^[0-9]+$", `Protocol Number`)

It will return TRUE is the column contains only digits, and FALSE otherwise.

user2332849
  • 1,421
  • 1
  • 9
  • 12
  • Works like a charm. Thanks! Although I still don't understand why validate applies is.numeric to the entire column whereas other functions such as is.na are applied to individual values. – mandmeier May 18 '22 at 14:36
  • @mandmeier is.numeric() checks if the type of the variable is of type numeric. There are also functions like is.double(), is.logical() and is.character(). In this case, your variable is of type character, which happens to contain a sequence of digits. That's why is.numeric() will always return FALSE for you in this case. – user2332849 May 18 '22 at 16:26