R: Count the number of terms that include special characters (e.g. [url] in a dataset

Question

I have a dataset, where I have converted hyperlinks into [url] - see example of posts at the bottom. I just wanted to count the frequency of the "[url]" by using R.

I have tried the following without success:

data = read.csv(X: ....... ,tweets.csv)
word_split= strsplit(USER_POST, " ")
sum(stringr::str_count(USER_POST, "[url]"))

I have also tried this

sum(stringr::str_count(USER_POST, "\\b[url]\\b"))

The result is 0. However, when I check in Excel, it appears around 7 times. Could anyone guide me about what I am doing wrong? Thank you in advance.

EDIT BELOW with further details:

USER_ID    USER_POSTS 
123        I like butterflies. 
234        I have found some information in this webpage [url] 
456        Find more information here [url]

Could you provide a snippet of your data, best in a reprex format? Otherwise it's a bit difficult to account for the various details. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — zoowalk, Dec 03 '20 at 23:41
@RonakShah thank you, I have just added further information. — Louise, Dec 04 '20 at 16:04

Chris Ruehlemann · Accepted Answer · 2020-12-04T17:05:48.967

1

If I understand your question correctly this should be a workable solution:

library(stringr)
str_count(x, "\\[url\\]")
[1] 2

The key here is to take into account that the [and ] characters are metacharacters in regex. If you want to match them as literal characters you need to escape them using, in R, the double slash \\.

Alternatively, str_count allows you to set metacharacters as fixed literal characters:

str_count(x, fixed("[url]"))
[1] 2

Data:

x <- "USER_ID USER_POSTS 123 I like butterflies. 234 I have found some information in this webpage [url] 456 Find more information here [url]"

edited Dec 04 '20 at 17:05

answered Dec 04 '20 at 16:57

Chris Ruehlemann

20,321
4
12
34

Hi Chris , it worked perfectly!!! Thank you so much!!!! – Louise Dec 04 '20 at 18:44
I'm glad it did. Please consider accepting and or upviting the answer. – Chris Ruehlemann Dec 05 '20 at 00:09

R: Count the number of terms that include special characters (e.g. [url] in a dataset

1 Answers1