1

Just as a disclaimer, I just want to do this to make my life easier when reading logs.. Sometimes they have more than 100mb of text

I want to match a XML Group in which it contains some data.

Suppose I have a XML like below (and they are in the same line):

<car><id>1</id><acquiredDate>23-09-2016</acquiredDate><model>BMW</model></car>
<car><id>2</id><acquiredDate>23-09-2016</acquiredDate><model>BMW</model></car>
<car><id>3</id><acquiredDate>24-09-2016</acquiredDate><model>BMW</model></car>
<car><id>4</id><acquiredDate>23-09-2016</acquiredDate><model>BMW</model></car>

I want to match all cars that have were acquired on 23-09-2016. (3 matches on this case)

What I have so far is <car>.*?<acquiredDate>23-09-2016<\/acquiredDate>.*?<\/car>, but it will match the third and fourth car together. Something like:

<car><id>1</id><acquiredDate>23-09-2016</acquiredDate><model>BMW</model></car>

<car><id>2</id><acquiredDate>23-09-2016</acquiredDate><model>BMW</model></car>

<car><id>3</id><acquiredDate>24-09-2016</acquiredDate><model>BMW</model></car><car><id>4</id><acquiredDate>23-09-2016</acquiredDate><model>BMW</model></car>

I tried using something like <car>(?!.*<car>.*).*?<acquiredDate>23-09-2016<\/acquiredDate>.*?<\/car> but it will match only the last.

How I achieve that?

Felipe S.
  • 1,633
  • 2
  • 16
  • 23
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – SierraOscar Sep 23 '16 at 22:14
  • please read my disclaimer. as the second answer says `it's sometimes appropriate to parse a limited, known set of HTML`. that's what I want – Felipe S. Sep 23 '16 at 22:16
  • `<\/acquiredDate>.*?<\/car>` Why do you use `.*?` between the tags when your example shows nothing in-between them? – Jesse Good Sep 23 '16 at 22:18
  • sorry about that @JesseGood.. the real case has data between.. let me update the question – Felipe S. Sep 23 '16 at 22:19
  • @FelipeS. I read your disclaimer - it's still easier to use a DOM method like `getElementsByTagName('acquiredDate')` and read the `innerHTML` property. – SierraOscar Sep 23 '16 at 22:20
  • @MacroMan I'm using just to search data.. how do I use `getElementsByTagName` in `less`, `grep` or something similar ? :/ – Felipe S. Sep 23 '16 at 22:22
  • But can you use lookarounds at least? I see you are using it, just to confirm. – Wiktor Stribiżew Sep 23 '16 at 22:22
  • @WiktorStribiżew lookarounds would be fine! I just couldn't make it work properly.. – Felipe S. Sep 23 '16 at 22:24
  • 1
    It is a bit strange, since [your regex works](https://regex101.com/r/jK0xM0/1) if the `.` does not match a newline and the strings are on separate lines. If I use a dotall modifier, I get [the same behavior](https://regex101.com/r/jK0xM0/2) you describe. – Wiktor Stribiżew Sep 23 '16 at 22:26
  • 1
    Ok, I'd suggest [`(?:(?!<(?:car|acquiredDate)>).)*23-09-2016<\/acquiredDate>.*?<\/car>`](https://regex101.com/r/jK0xM0/3) – Wiktor Stribiżew Sep 23 '16 at 22:28
  • @WiktorStribiżew would you know how to fix my regex using dotall modifier? – Felipe S. Sep 23 '16 at 22:28
  • 1
    Yeah, I posted my suggestion above. But MacroMan's pattern might work as well with your data. – Wiktor Stribiżew Sep 23 '16 at 22:29

1 Answers1

1

If you really want to go down the regex-matching-html route, then assuming you want to match the whole line, something like this would work:

/(?:^\<car\>[<\w>\/]+acquiredDate\>)(23\-09\-2016)(?:.+$)/gm
                                     ^^  ^^  ^^^^ 
                                 (change as required)
SierraOscar
  • 17,507
  • 6
  • 40
  • 68
  • This is exactly the match I needed! Sorry for being insistent in the regex thing.. It's because I'll use to find some text in a giant log. Now I can find my bugs !! thanks! – Felipe S. Sep 23 '16 at 22:36
  • 1
    @FelipeS. just to be clear, I'm not preaching about _not using regex_ (notice I didn't downvote your question - it's a perfectly good and valid question to ask) it's just that using a parser can make life easier. Ultimately whatever works for you is the right way so it's your preference. – SierraOscar Sep 23 '16 at 22:41