0

I have a dataset, where I have converted hyperlinks into [url] - see example of posts at the bottom. I just wanted to count the frequency of the "[url]" by using R.

I have tried the following without success:

data = read.csv(X: ....... ,tweets.csv)
word_split= strsplit(USER_POST, " ")
sum(stringr::str_count(USER_POST, "[url]"))

I have also tried this

sum(stringr::str_count(USER_POST, "\\b[url]\\b"))

The result is 0. However, when I check in Excel, it appears around 7 times. Could anyone guide me about what I am doing wrong? Thank you in advance.

EDIT BELOW with further details:

USER_ID    USER_POSTS 
123        I like butterflies. 
234        I have found some information in this webpage [url] 
456        Find more information here [url] 
ekoam
  • 8,744
  • 1
  • 9
  • 22
Louise
  • 83
  • 5
  • Could you provide a snippet of your data, best in a reprex format? Otherwise it's a bit difficult to account for the various details. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – zoowalk Dec 03 '20 at 23:41
  • @zoowalk thank you , I have just added further information – Louise Dec 04 '20 at 16:03
  • @RonakShah thank you, I have just added further information. – Louise Dec 04 '20 at 16:04

1 Answers1

1

If I understand your question correctly this should be a workable solution:

library(stringr)
str_count(x, "\\[url\\]")
[1] 2

The key here is to take into account that the [and ] characters are metacharacters in regex. If you want to match them as literal characters you need to escape them using, in R, the double slash \\.

Alternatively, str_count allows you to set metacharacters as fixed literal characters:

str_count(x, fixed("[url]"))
[1] 2

Data:

x <- "USER_ID USER_POSTS 123 I like butterflies. 234 I have found some information in this webpage [url] 456 Find more information here [url]"
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34