Remove everything before a certain occurrence identified by position in string

Question

I have a string looking like a.

I would like to delete everything before the 2nd to last occurrence of the patter === test, === included.

a <- "=== test : {abc}
      === test : {abc}
      === test : {abc}
      === test : {aUs*} 
      === dce
      === test : {12abc}
      === abc
      === test : {abc}
      === test : {dfg}"

result <- "test : {abc}
           === test : {dfg}"

I tried:

gsub(".*=== test", "", a)

How to set the index 2nd to last?

Thanks

Do you have newlines here? Please [use `dput` to provide a valid MCVE](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Wiktor Stribiżew, Aug 22 '18 at 18:16

score 0 · Answer 1 · answered Aug 22 '18 at 17:10

The below should work. I split the data into a vector separated by newline \\n (the additional backslash is to "escape" the special character) and then used grep to find all occurances of the pattern ^=== test the leading ^ means the string should begin with this.

DATA

a <- "=== test : {abc}
      === test : {abc}
      === test : {abc}
      === test : {aUs*} 
      === dce
      === test : {12abc}
      === abc
      === test : {abc}
      === test : {dfg}"

CODE

# convert to a vector for ease
b <- unlist(strsplit(a, '\\n'))

# get indices for each occurrence of the pattern  
indices <- grep('^=== test', b)

# we only need the last two occurrences 
n <- length(indices)

res <- b[indices[(n-1):n]]

# res is a vector with two entries, to get it back to a single entry 
# same as the original data, we use paste(.., collapse = '\\n')
result <- paste(res, collapse = '\\n')

OUTPUT

> result
[1] "=== test : {abc}\\n=== test : {dfg}"

I have a dataframe for which a columns contains a string like that in each row, so I would need to iterate it for each row ... I will try to see how it performs @Gautam — thequietus, Aug 22 '18 at 17:13
You can do all of this in a single call as well. I expanded it here to show what the code does. You can wrap the single call in function and use `lapply` to run it on all rows of the `data.frame` object. — Gautam, Aug 22 '18 at 17:15
@thequietus Maybe you should then provide the dataframe in your question, and your expected output. — acylam, Aug 22 '18 at 17:20

acylam · Accepted Answer · 2018-08-22T17:18:38.237

0

We can use strsplit to split by line breaks and pick the last two elements. paste them together and use sub to remove the === in the beginning:

sub("^=== ", "", paste(tail(strsplit(a, split = "\\n")[[1]], 2), collapse = "\n"))
# [1] "test : {abc}\n=== test : {dfg}"

edited Aug 22 '18 at 17:18

answered Aug 22 '18 at 17:12

acylam

18,231
5
36
45

Remove everything before a certain occurrence identified by position in string

2 Answers2