Extraction of sub string from the text using R

Question

I have a string data as follows:

a<-  "\n    Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n        Uploaded on May 3, 2020 at 10:56 in Research\n            View Forum\n        \n"

I have to extract the string "Social Media Learning and behaviour" for this I used the below code:

gsub("        Uploaded on .* ", "", gsub("\n    Update Your Profile to Dissolve This Message\n", "",a))

This gives me output as below

"Social Media Learning and behaviour\n\n"

I am not able to match the exact pattern. What can be the exact pattern to extract "Social Media Learning and behaviour" without "\n\n"

You could also match the line before in a capturing group, and match the line after it that contains Uploaded `^(.*)\r?\n Uploaded on` https://regex101.com/r/bF5GKT/1 — The fourth bird, May 31 '20 at 08:51

score 1 · Answer 1 · answered May 31 '20 at 09:05

You could capture the previous line in a group and match the next line that contains Uploaded:

(.*)\r?\n[^\S\r\n]+Uploaded on

Regex demo

a<-  "\n    Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n        Uploaded on May 3, 2020 at 10:56 in Research\n            View Forum\n        \n"
stringr::str_match(a, "(.*)\\r?\\n[^\\S\\r\\n]+Uploaded on")

score 0 · Accepted Answer · answered May 31 '20 at 08:51

You can extract part between "Update Your Profile to Dissolve This Message" and "Uploaded on"

sub(".*Update Your Profile to Dissolve This Message\n(.*)\n\\s+Uploaded on.*", "\\1", a)
#[1] "Social Media Learning and behaviour"

You can also use str_match from stringr

stringr::str_match(a, "Update Your Profile to Dissolve This Message\n(.*)\n\\s+Uploaded on")[, 2]

Extraction of sub string from the text using R

2 Answers2