-1

I am using R version 3.6.1 within R Studio version 1.2.1335.

I'm trying to write a function to count the number of capital letters within a string. I've been playing around with different regex statements within grepl, and coming up with weird results.

I decided to use strsplit to split the string into separate characters, and then sapply over those characters to check if they are capitalized with grepl and [:upper:], as shown below:

s <- 'Testing'
strsplit(s, character(0))[[1]]

[1] "T" "e" "s" "t" "i" "n" "g"

unname(sapply(strsplit(s, character(0))[[1]], function(x) grepl(x, '[:upper:]')))

[1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE

This output is stating the the 'T' character is not uppercase, whereas the 'e' character is.

When I use the regex 'A-Z' instead:

sapply(strsplit(s, character(0))[[1]], function(x) grepl(x, '[A-Z]'))

I get the output of "FALSE" for all items (whereas it should be "TRUE" for the "T" character).

When I try the regex on its own for each letter, I get results consistent with the above output:

grepl('T', '[:upper:]')
grepl('e', '[:upper:]')

This gives back FALSE for "T" and TRUE for "e".

I'm really confused as to what I'm doing wrong. I'm still wrapping my head around regex statements and any help would be appreciated!

Tam R
  • 125
  • 2
  • 8
  • We need `grepl('[[:upper:]]', strsplit(s, character(0))[[1]]) [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE` – akrun Jan 27 '20 at 19:28
  • See the [docs](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/strsplit): *if there is a match at the beginning of a (non-empty) string, the first element of the output is `""`* - that is why when you split with `[A-Z]` you get the first item as `FALSE`, as `T` is the second item. – Wiktor Stribiżew Jan 27 '20 at 19:34

1 Answers1

1

We need [[:upper:]]

grepl('[[:upper:]]', strsplit(s, character(0))[[1]])
#[1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Also, once we extract the list element with [[ i.e here it is a list of length 1, grep/grepl are vectorized and the arguments in grep/grepl are in the order pattern, followed by x which is the vector

akrun
  • 874,273
  • 37
  • 540
  • 662