0

I have a string looking like a.

I would like to delete everything before the 2nd to last occurrence of the patter === test, === included.

a <- "=== test : {abc}
      === test : {abc}
      === test : {abc}
      === test : {aUs*} 
      === dce
      === test : {12abc}
      === abc
      === test : {abc}
      === test : {dfg}"

result <- "test : {abc}
           === test : {dfg}"

I tried:

gsub(".*=== test", "", a)

How to set the index 2nd to last?

Thanks

thequietus
  • 129
  • 1
  • 1
  • 6
  • Do you have newlines here? Please [use `dput` to provide a valid MCVE](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Wiktor Stribiżew Aug 22 '18 at 18:16

2 Answers2

0

The below should work. I split the data into a vector separated by newline \\n (the additional backslash is to "escape" the special character) and then used grep to find all occurances of the pattern ^=== test the leading ^ means the string should begin with this.

DATA

a <- "=== test : {abc}
      === test : {abc}
      === test : {abc}
      === test : {aUs*} 
      === dce
      === test : {12abc}
      === abc
      === test : {abc}
      === test : {dfg}"

CODE

# convert to a vector for ease
b <- unlist(strsplit(a, '\\n'))

# get indices for each occurrence of the pattern  
indices <- grep('^=== test', b)

# we only need the last two occurrences 
n <- length(indices)

res <- b[indices[(n-1):n]]

# res is a vector with two entries, to get it back to a single entry 
# same as the original data, we use paste(.., collapse = '\\n')
result <- paste(res, collapse = '\\n')

OUTPUT

> result
[1] "=== test : {abc}\\n=== test : {dfg}"
Gautam
  • 2,597
  • 1
  • 28
  • 51
  • I have a dataframe for which a columns contains a string like that in each row, so I would need to iterate it for each row ... I will try to see how it performs @Gautam – thequietus Aug 22 '18 at 17:13
  • You can do all of this in a single call as well. I expanded it here to show what the code does. You can wrap the single call in function and use `lapply` to run it on all rows of the `data.frame` object. – Gautam Aug 22 '18 at 17:15
  • @thequietus Maybe you should then provide the dataframe in your question, and your expected output. – acylam Aug 22 '18 at 17:20
0

We can use strsplit to split by line breaks and pick the last two elements. paste them together and use sub to remove the === in the beginning:

sub("^=== ", "", paste(tail(strsplit(a, split = "\\n")[[1]], 2), collapse = "\n"))
# [1] "test : {abc}\n=== test : {dfg}"
acylam
  • 18,231
  • 5
  • 36
  • 45