1

I'm looking for a simple way to check if values in an R data frame have comma (or any character for that matter).

Let's suppose I have the following data frame:

df <- data.frame(A = c("apple","orange", "banana","strawberries"), 
                 B = c(23,12,10,15), 
                 C = c("2,53", "1.35","0,25","1,44"))

If I know the column with commas in it I use this:

which(grepl(",",df$C))
length(which(grepl(",",df$C)))

However, I want an output as the one above but not specifying the column of my dataframe.

Any suggestions?

M--
  • 25,431
  • 8
  • 61
  • 93

2 Answers2

3

You need to simply go through all three columns; sapply works here:

sapply(df, grep, pattern = ",")


##output:

# $A
# integer(0)
# 
# $B
# integer(0)
# 
# $C
# [1] 1 3 4

To get the length you can do this:

sapply(sapply(df, grep, pattern = ","), length)

# A B C D 
# 0 0 3 0 
M--
  • 25,431
  • 8
  • 61
  • 93
3

Somewhat simpler to grasp solution; first, convert your data frame to vector.

df2vector <- as.vector(t(df))

df2vector 
# [1] "apple"        "23"           "2,53"         "orange"       "12"          
# [6] "1.35"         "banana"       "10"           "0,25"         "strawberries"
# [11] "15"           "1,44"        

Then use your approach.

length(which(grepl(",",df2vector)))
# [1] 3
M--
  • 25,431
  • 8
  • 61
  • 93
  • 2
    Neat solution. The one-liner can be (obviously) `length(which(grepl(",",as.vector(t(df)))))`. Please avoid including `>` and `+` in your code so users can easily copy/paste them and hit run! Moreover, it's a good practice to add `#` in front of results, because they are not run-able and, again, we want to copy, paste, and run. Cheers. +1 – M-- Mar 25 '19 at 15:20