3

How can I filter all rows that start with any Latin alphabetic letter in R

sample code that is not working

library(dplyr)

df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
                  
                  age = c(21:25),
                  
                  roles = c('Software Eng.', 'Software Dev', 
                            'Data Analyst', 'Data Eng.',
                            '5Sigma'))

df %>% filter(grep("[A-z]", roles))

Desired output

  marks age         roles
1  20.1  21 Software Eng.
2  30.2  22  Software Dev
3  40.3  23  Data Analyst
4  50.4  24     Data Eng.
Macosso
  • 1,352
  • 5
  • 22
  • 3
    The `field:` is not a substring in your data. Also, `filter` expects a logical vector whereas `grep` return index. `df %>% filter(grepl("^[A-z]+", roles))`? – akrun Nov 29 '21 at 17:38
  • Thanks, this also solved my problem, Regex is my biggest weakness – Macosso Nov 29 '21 at 17:49

1 Answers1

5

First, [A-z] is not the same as [A-Za-z], you need to be more careful with character classes. (See Difference between regex [A-z] and [a-zA-Z] and ignore the portions.)

Second, where does field: come in? Do this:

df %>%
  filter(grepl("^[A-Za-z]", roles))
#   marks age         roles
# 1  20.1  21 Software Eng.
# 2  30.2  22  Software Dev
# 3  40.3  23  Data Analyst
# 4  50.4  24     Data Eng.

(Plus the previous comment about grepl versus grep.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Thanks, this solved my problem, and will dive into those resources. Regex is my biggest weakness – Macosso Nov 29 '21 at 17:47