2

I have a string:

a="<gml:posList srsDimension=\"2\" count=\"5\">7 -5.067 -3 56.7 -3.3 58.3 -5.65 57 -8.33</gml:posList>"

and want to gsub everything between the < and >, to now avail so far. I want to have only the numbers remaining (ie 7 -5 -3 56 -3 58...) where I can take every even/odd element to process.

I tried Remove all text between two brackets to no avail

    > gsub('<^|*>','',a[[1]],perl=TRUE)
Error in gsub("<^|*>", "", a[[1]], perl = TRUE) : 
  invalid regular expression '<^|*>'
In addition: Warning message:
In gsub("<^|*>", "", a[[1]], perl = TRUE) : PCRE pattern compilation error
    'nothing to repeat'
    at '*>'

and

gsub('<gml.+>\\d','',a[[1]])

which cuts removes the first digit

I am sure I am missing something obvious, as '<' is not a special character.

Here are some other tries (and fails)

> gsub('<.+>','',a[[1]])
[1] ""
> gsub('<.+>.+<.+>','',a[[1]])
[1] ""
> gsub('<gml.+>','',a[[1]])
[1] ""
frank
  • 3,036
  • 7
  • 33
  • 65

2 Answers2

13

You can use

 gsub("<[^>]+>", "",a)
[1] "7 -5.067 -3 56.7 -3.3 58.3 -5.65 57 -8.33"

"<" and ">" are literals, "[^>]" matches any character that is not ">" and "+" allows for one or more matches. Using gsub repeats this match as many times as this pattern is found. The pattern is replaced by the empty string "".

lmo
  • 37,904
  • 9
  • 56
  • 69
1
library(qdapRegex)
a="<gml:posList srsDimension=\"2\" count=\"5\">7 -5.067 -3 56.7 -3.3 58.3 -5.65 57 -8.33</gml:posList>"
rm_between(a, "<", ">", extract = T)
MLEN
  • 2,162
  • 2
  • 20
  • 36