-1

Just as a warning this post contains profanity (apologies in advance).

In R I am trying to do a webscraping project with rap song lyrics. I have scraped lyrics from Wu Tang songs from a site called AZlyrics.

Let's say I have a two string of lyrics like this:-

string1 = "Bring da fuckin ruckus...[Chorus]"
string2 = "I scream on ya ass like your dad, bring it on...[Chorus][Verse Four: The Genius/GZA]"

I would like to remove [....] from my strings such that the two strings should become:

""Bring da fuckin ruckus..." 
"I scream on ya ass like your dad, bring it on..."

I have been trying to do this with

stringr::str_replace(string1, "\[.*?\]", "")

but I get this error:

Error: '\[' is an unrecognized escape in character string starting ""\["

I am quite unfamiliar with how regex works so I am not sure how to fix this.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • amazing song but this question needs formatting to make it read a bit better :) – treyBake Feb 05 '20 at 14:58
  • not sure why this is getting downvoted (failure to find previous answers?) The question is clearly stated and the OP made an attempt. – Ben Bolker Feb 05 '20 at 17:52

1 Answers1

1

You need double-backslashes to escape properly.

re <- "(\\[[^]]*\\])+"
stringr::str_replace_all(string2, re, "")
  • ( begin group
  • \\[ literal open-bracket
  • [^]]* zero or more instances of characters that are not ]
  • \\] literal close-bracket
  • + one or more instances of group

A more challenging example:

str_replace_all('a [][adsfads] b [some]',re,"") ## "a  b "

str_remove_all() would be a slight improvement over "replace with blank"

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    Just an FYI: when you edited the post, you changed the question to include the \\, so right now it looks as if there is nothing wrong. – deef0000dragon1 Feb 05 '20 at 15:08
  • thanks. The problem was moving the code from regular text to a code chunk. Fixed now. – Ben Bolker Feb 05 '20 at 15:17
  • 1
    When I do `stringr::str_replace(string2, "\\[.*?\\]", "") [1] "I scream on ya ass like your dad, bring it on...[Verse Four: The Genius/GZA]"`the second `[...]` is being missed. To match that too you need to include a quantifier expression: `stringr::str_replace(string2, "(\\[.*?\\]){1,}", "")` – Chris Ruehlemann Feb 05 '20 at 15:21
  • Thank you so much! Apologies in the post. It was my first so I was surprised why when I wrote one \ in the textbox only 1 appeared in the display. – BeginnerByron Feb 05 '20 at 15:21
  • @ChrisRuehlemann see improvements – Ben Bolker Feb 05 '20 at 17:52