-1

i am a learner of regular expressions. I am trying to find the date from the below string. The element <ext:serviceitem> can be repeated upto 20 times in actual xml. I need to take out only the date strings (like any element ending with Date in its name, i need that element's value which is a date). For example and . I want all those dates (only) to be printed out.

<ext:serviceitem><ext:name>EnhancedSupport</ext:name><ext:serviceItemData><ext:serviceItemAttribute name="Name">E69D7F93-81F4-09E2-E043-9D3226AD8E1D-1</ext:serviceItemAttribute><ext:serviceItemAttribute name="ProductionDatabase">P1APRD</ext:serviceItemAttribute><ext:serviceItemAttribute name="SupportType">Monthly</ext:serviceItemAttribute><ext:serviceItemAttribute name="Environment">DV1</ext:serviceItemAttribute><ext:serviceItemAttribute name="StartDate">2013-11-04 10:02</ext:serviceItemAttribute><ext:serviceItemAttribute name="EndDate">2013-11-12 10:02</ext:serviceItemAttribute><ext:serviceItemAttribute name="No_of_WeeksSupported"></ext:serviceItemAttribute><ext:serviceItemAttribute name="Cost"></ext:serviceItemAttribute><ext:serviceItemAttribute name="SupportNotes"></ext:serviceItemAttribute><ext:serviceItemAttribute name="FiscalQuarterNumber"></ext:serviceItemAttribute><ext:subscription><ext:loginID>kbasavar</ext:loginID><ext:ouname>020072748</ext:ouname></ext:subscription></ext:serviceItemData></ext:serviceitem><ext:serviceitem><ext:name>EnhancedSupport</ext:name><ext:serviceItemData><ext:serviceItemAttribute name="Name">E69D7F93-81F4-09E2-E043-9D3226AD8E1D-2</ext:serviceItemAttribute><ext:serviceItemAttribute name="ProductionDatabase">P1BPRD</ext:serviceItemAttribute><ext:serviceItemAttribute name="SupportType">Quarterly</ext:serviceItemAttribute><ext:serviceItemAttribute name="Environment">TS2</ext:serviceItemAttribute><ext:serviceItemAttribute name="StartDate">2013-11-11 10:03</ext:serviceItemAttribute><ext:serviceItemAttribute name="EndDate">2013-11-28 10:03</ext:serviceItemAttribute><ext:serviceItemAttribute name="No_of_WeeksSupported"></ext:serviceItemAttribute><ext:serviceItemAttribute name="Cost"></ext:serviceItemAttribute><ext:serviceItemAttribute name="SupportNotes"></ext:serviceItemAttribute><ext:serviceItemAttribute name="FiscalQuarterNumber"></ext:serviceItemAttribute><ext:subscription><ext:loginID>kbasavar</ext:loginID><ext:ouname>020072748</ext:ouname></ext:subscription></ext:serviceItemData></ext:serviceitem>

I tried with below regex, but its returning rest of the string after the first occurence.

(?<=Date\"\>).*(?=\<\/ext\:serviceItemAttribute\>)

Any help would be highly appreciated.

Kiran
  • 95
  • 1
  • 1
  • 7
  • 2
    Have a look at [this](http://stackoverflow.com/questions/8577060/why-is-it-such-a-bad-idea-to-parse-xml-with-regex), please. – O. R. Mapper Sep 27 '13 at 09:42

1 Answers1

0

Your problem is that .* is greedy, meaning that it will grab from the first instance of Date to the last instance of </ext:ser..... Replace the .* with .*? and it will alter the behaviour to what you're after.

#(?<=Date">).*?(?=</ext:serviceItemAttribute>)#i

You should have .*? in a capture group: (.*?).

#(?<=Date">)(.*?)(?=</ext:serviceItemAttribute>)#i

You could also do it - more simply - like:

#Date">(.*?)</ext#i

Update

As has been pointed out in the comment below this (above) solution relies on the use of non-greedy matching.

To get around this you could use the following: ([^<]*) instead of (.*?)

NOTE: This does not impact the alternatives below.


Alternatives

/(\d{4}-\d{2}-\d{2})/
/(\d{4}-\d{2}-\d{2} \d{2}:\d{2})/

The above patterns will match dates in the format YYYY-XX-XX and YYYY-XX-XX HH:MM respectively

Steven
  • 6,053
  • 2
  • 16
  • 28
  • This, of course, assumes that your regular expression dialect supports non-greedy matching. The OP had better include information about the platform so we don't have to guess about which regex features are supported by the available tool(s). – tripleee Sep 27 '13 at 10:23
  • Thank you very much. This one `(?<=Date">)(.*?)(?=)` worked for me. – Kiran Sep 27 '13 at 10:45
  • Good to know it worked! @tripleee: A valid point, it turns out that it did work in this case. However, I have updated the answer with a workaround. – Steven Sep 27 '13 at 10:49