1

I know this might be a duplicate, but I couldn't apply or completely understand the similar questions I read.

I have a column with grades that is supposed to have numeric entries. However during the data - entry manual process some rows of that column have some non numeric entries.These consist of text or a combination of text and numbers. Is there any way I can find any entry that is not consisted only of numbers? I am suspecting I need regular expressions but I am not sure.

My column looks like:

grades <- c(12, "missing", 20, 10, "accommodated-18", 13, "accommodated-20", 20, "sick", 17)

I know that some rows have a "missing" and and an "accommodated" word in them so I can locate them by using grep.

grades_missing <- grep(pattern = "missing", x = grades)
grades_missing_index <- as.vector(grades_missing)
missing <- grades_missing[isbn_missing_index,]

Which returns to me all the rows that have the word missing in them. Similarly I do this for the "accommodated". But if there are more non-entirely-numeric entries and I am not aware of them, how I can find them? For example I would need something that will tell me that rows 2,5,7,9 have non numeric entries. (And then by using the vector indices I will be able to see them. (Something similar to what I did before).

Any ideas?

Cyrus
  • 84,225
  • 14
  • 89
  • 153
Iniciador
  • 93
  • 2
  • 8
  • 1
    Dupe of [Finding non-numeric data in an R data frame or vector](https://stackoverflow.com/questions/21196106). See [that code with your vector](http://rextester.com/PVPB50438). – Wiktor Stribiżew Apr 18 '18 at 17:11
  • Thanks, I tried your function and Florian's suggestion, it works till the which output which is an integer and I see in the console all the indices of the non numeric entries, but when I try to convert it to a vector and something similar to this missing <- grades_missing[isbn_missing_index,] to see what the non numeric entries are, I get an error. – Iniciador Apr 18 '18 at 17:29
  • @WiktorStribiżew When I apply the function you suggested I get 9,341 non numeric entries (which I haven't found a way to see, just doing View() of the which.. , and when applying Florian's suggestion I get 313 non numeric entries. So even though in my naive grades vector both approaches work, in my data set they give me different results. If I could see the entries I would be able to tell which approach works. – Iniciador Apr 18 '18 at 17:40

1 Answers1

3

You could try

which(!grepl('^[0-9]',grades))

to check which entries do not consist out of only numeric characters. It outputs

2 5 7 9

Hope this helps!

Florian
  • 24,425
  • 4
  • 49
  • 80
  • Or, `grep("[^0-9]",grades)` – Brian Davis Apr 18 '18 at 17:12
  • Thanks, modified it, that was a bit sloppy indeed :) – Florian Apr 18 '18 at 17:16
  • I tried: a <- which(!grepl('^[0-9]',presentISBN$ISBN)) a_index <- as.vector(a) non_numerics <- presentISBN$ISBN[a_index,] and I got: Error in presentISBN$ISBN[a_index, ] : incorrect number of dimensions. Should I say that my grades column is a df_name$grades ? Does it make any difference? – Iniciador Apr 18 '18 at 17:27
  • It is a vector, which has only a single dimension. You should do either `presentISBN$ISBN[a_index]` or `presentISBN[a_index,]`. – Florian Apr 18 '18 at 18:21