0

I have a large dataset that contains web domains. The domains end in a number of different top level domains such as .com, .app, .now and so on. I used the code below as i only want to keep the domains that end in the four below. However, when i run the code it keeps any domains that have the words com, net, tv or org. It seems to ignore the period. What can i use so it picks up on the period and only keeps the domains that end in the four below?

test <- new_df %>%
  filter(grepl('.com|.net|.tv|.org', domain))
Phil
  • 7,287
  • 3
  • 36
  • 66
  • Escape the period with backslashes. `grepl('\\.com|\\.net|\\.tv|\\.org', domain)`. Otherwise regex thinks the period means "match any character" – Allan Cameron Jul 28 '22 at 13:44

1 Answers1

0

In regex, "." means any character. If you really want a dot, you need to escape if

grepl('\\.com|\\.net|\\.tv|\\.org', c("jfkdjkfd.com", "jdksjkdscom")) # TRUE FALSE

Samuel Allain
  • 344
  • 1
  • 7