0

I want text after first < br/ > tag and later remove < br/ > in remaining part of text.

x=data.frame(text=c("Hi John, hope you are doing well.< br/ >Let me know, when we can meet? < br/ > I have lot to talk about")

Expected output:

"Let me know, when we can meet? I have lot to talk about"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Sand
  • 115
  • 1
  • 10

4 Answers4

4

Note that in general it is not ideal to be using regex to parse HTML content. Since your content is not nested, it might be reliable here, and we can try doing this with two calls to sub:

text <- "Hi John, hope you are doing well.< br/ >Let me know, when we can meet? < br/ > I have lot to talk about"
sub("< br/ >\\s*", "", sub(".*?< br/ >(.*)", "\\1", text))

[1] "Let me know, when we can meet? I have lot to talk about"

The inner call to sub first removes the leading portion of text up to, and including, the first < br/ > tag. Then, the second call to sub strips away all remaining < br/ > tags.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

A non-regex answer would be to split on "< br/ >" and collect all the terms except the first one and paste them together.

sapply(strsplit(as.character(x$text), "< br/ >"),
          function(x) paste0(x[-1], collapse = ""))
#[1] "Let me know, when we can meet?  I have lot to talk about"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Another less efficient approach with gsub:

res1<-gsub("< br/ >|\\s{1,}(?<=\\n)","",gsub(".*(?=Let)","",x$text,perl=TRUE),perl=TRUE)
gsub("  ","",res1,perl=TRUE)

This removes the space before I:

[1] "Let me know,when we can meet?I have lot to talk about
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
1

We can use str_extract_all to extract all text that occurs after the pattern (< br / >) and is not a <

library(stringr)
paste(str_extract_all(x$text, "(?<=< br/ >)[^<]+")[[1]], collapse="")
#[1] "Let me know, when we can meet?  I have lot to talk about"

Or another option is to replace the < br/ > with a delimiter, read with read.csv/read.table and paste

do.call(paste0, read.csv(text = gsub("< br/ >", ";", x$text, 
  fixed = TRUE), header = FALSE, sep=";", stringsAsFactors = FALSE)[-1])
#[1] "Let me know, when we can meet?  I have lot to talk about"
akrun
  • 874,273
  • 37
  • 540
  • 662