1

I have an XML code and I want to replace something in it. If there is "\\d{4}s</w:t>" I want to replace the LAST occurrence of "<w:rPr>" before this with "<w:rPr><w:keepNext/>" and I don't know how. What I tried is "str_replace_all(Text, "(<w:rPr>.*?)(\\d{4}s</w:t>)", "\\1<w:keepNext/>\\2")" but this does not replace the last occurrence of "<w:rPr>".

I asked a similar question before but it was closed because it was too similiar to Find shortest matches between two strings. But this did not help me.

EDIT: Here is a similar example: str_replace_all("hallohallohallo text bye hallohallo text2 bye", "(hallo)(.*?bye)", "\\1,\\2") leads to "hallo,hallohallo text bye hallo,hallo text2 bye" and I want "hallohallohallo, text bye hallohallo, text2 bye"

TobiSonne
  • 1,044
  • 7
  • 22

2 Answers2

1

I'm not sure I completely follow the XML matching (I'm not that well versed in parsing/selecting XML), but to the similar example, use the + character to create a group of one or more.

myText <- "hallohallohallo text bye hallohallo text2 bye"

str_replace_all(myText, "((hallo)+)(.*?bye)", "\\1,\\3")
#> [1] "hallohallohallo, text bye hallohallo, text2 bye"
```


Marcus
  • 3,478
  • 1
  • 7
  • 16
  • Thank you for your help! As I see now, this similar example is not what I was really looking for. I have another solution that I'm not 100% satisfied with but it works fine. – TobiSonne May 16 '20 at 07:39
0

A workaround that I've found:

str_split(Text, "(?<=<w:pPr>)") %>%
  unlist %>% 
  modify_if(~ str_detect(.x, "\\d{4}s</w:t>"), ~ str_c("<w:keepNext/>", .x)) %>% 
  str_c(collapse = "")
TobiSonne
  • 1,044
  • 7
  • 22