-3

cWhats i want is that all the spaces between <abc> tag to be removed and keep the spaces bwtween <efg> tag

<abc>this is between abc</abc><efg>this is between efg</efg>
<efg>this is between efg</efg><abc>this is between abc</abc>

i want output:

<abc>thisisbetweenabc</abc><efg>this is between efg</efg>
<efg>this is between efg</efg><abc>thisisbetweenabc</abc>

string = string.replaceAll("<abc> </abc>", ""); its not working for me

Phantômaxx
  • 37,901
  • 21
  • 84
  • 115
  • 2
    [H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) Just use an XML parser – ctwheels Dec 01 '17 at 16:56
  • Anyway, in regex you can use [`(?:^()|\G(?!^))(\S+)[ \t]*` replace with `$1$2`](https://regex101.com/r/hyxQZG/1) – ctwheels Dec 01 '17 at 16:59
  • Post simple but real use-case. We want to avoid situation when we will provide solution to current problem but then it will turn out that both tags can nest each other like `foo bara b c bam` and instead of `foobarabcbam` you want `foobara b cbam` despite the fact that `a b c` is also inside ``. – Pshemo Dec 01 '17 at 17:11

2 Answers2

0

Brief

I urge you to use an XML parser!!! Anyway, if it's a limited, known set of HTML, you can use the following regex (as per my original comment).

Note: This solution only works on a limited, known set of HTML. If you input differs from what you posted in your question it is likely this solution will not work. See Pshemo's comment below your question.

Note 2: The OP changed the format of the input, thus my original answer will no longer work. See original input below. (Exactly why I put a limited, known set of HTML). In the Code section I've added a second regex that works on the OP's newly added input.


Code

See regex in use here

(?:^(<abc>)|\G(?!^))(\S+)[ \t]*

Replace with $1$2

With the new input format, the following regex can be used (as seen in use here):

(?:^(<abc>)|\G(?!^))([^\s<]+)[ \t]*

Results

Input

<abc>this is between abc</abc>
<efg>this is between efg</efg>
<abc>this is between abc</abc>
<efg>this is between efg</efg>

Output

<abc>thisisbetweenabc</abc>
<efg>this is between efg</efg>
<abc>thisisbetweenabc</abc>
<efg>this is between efg</efg>

Explanation

  • (?:^(<abc>)|\G(?!^)) Match either of the following
    • ^(<abc>) Match the following
      • ^ Assert position at the start of the line
      • (<abc>) Capture <abc> literally into capture group 1
    • \G(?!^) Assert position at the end of the previous match
  • (\S+) Capture any non-whitespace character one or more times into capture group 2
  • [ \t]* Match space or tab characters any number of times
ctwheels
  • 21,901
  • 9
  • 42
  • 77
0

Simple just do

xml = my overall string with <abc> and </abc> stuff
start = xml.indexOf('<abc>')
end = xml.indexOf('</abc>')
totalCharsToInclude = end - start (get the length to run from start)

abcOnly = xml.subString(start, totalCharsToInclude), 
abcOnly = abcOnly.replace(" ", "")

This is all pseduo code, but you can easily mimic it. You may also have to tweak the indexes with plus or minus, I am not in front of your code to test it, but you should be able to get what you need from this.

Disclaimer: Using XML parser is far better way to handle this, then manipulating strings, but I'll assume you have your reasons, so I'll answer the question you asked, instead of telling you to go get XML parser lol. Good luck.

Sam
  • 5,342
  • 1
  • 23
  • 39