0

I have the following tag from an XML file:

<msg><![CDATA[Method=GET URL=http://test.de:80/cn?OP=gtm&Reset=1(Clat=[400441379], Clon=[-1335259914], Decoding_Feat=[], Dlat=[0], Dlon=[0], Accept-Encoding=gzip, Accept=*/*) Result(Content-Encoding=[gzip], Content-Length=[7363], ntCoent-Length=[15783], Content-Type=[text/xml; charset=utf-8]) Status=200 Times=TISP:270/CSI:-/Me:1/Total:271]]>

Now I try to get from this message: Clon, Dlat, Dlon and Clat.

However, I already created the following regex:

(?<=Clat=)[\[\(\d+\)\n\n][^)n]+]

But the problem is here, I would like to get only the numbers without the brackets. I tried some other expressions. Do you maybe know, how I can expand this expression, in order to get only the values without the brackets?

Thank you very much in advance.

Best regards

cimbom
  • 259
  • 1
  • 5
  • 23
  • [Don't parse xml with regex](http://stackoverflow.com/a/1732454/3144928). There are better ways to do this. – Ben Aubin Dec 14 '15 at 17:17
  • It's inside a CDATA element. If he uses an XML Parser to get the CDATA element, then he has to extract the information inside there with a regex. So, it's really okay to do this in this case. Just don't use the regex on the whole XML document. – Maximilian Gerhardt Dec 14 '15 at 17:20
  • The problem is that I do this for logstash. For the other parts, I used xpath. But for the CDATA part, xpath is not working. – cimbom Dec 14 '15 at 17:21
  • `(?:Clon|Dlat|Dlon|Clat)=\[(?\d+)\]`, see https://regex101.com/r/tY8tQ7/2 for a working fiddle. – Jan Dec 14 '15 at 17:29
  • Hello Jan, your solution is not working, because it neglects negative numbers – cimbom Dec 14 '15 at 17:35
  • @cimbom Right you are, missed that :) – Jan Dec 14 '15 at 18:53

1 Answers1

1

The regex

(clon|dlat|dlon|clat)=\[(-?\d+)\]

Gives

enter image description here

As I stated before, if you use this regex to extract the information out of this CDATA element, that's okay. But you really want to get to the contents of that element using an XML parser.

Example usage

Regex r = new Regex(@"(clon|dlat|dlon|clat)=\[(-?\d+)\]"); 
string s = ".. here's your cdata content .. "; 
foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnoreCase)) 
{
      var name = match.Groups[1].Value; //will contain "clon", "dlat", "dlon" or "clat"
      var inner_value = match.Groups[2].Value; //will contin the value inside the square-brackets, e.g. "400441379"
      //Do something with the matches
}
Maximilian Gerhardt
  • 5,188
  • 3
  • 28
  • 61
  • Hello Maximilian. Thank you very much for your response. Is it possible to get the values without: "[" and "]"? For the other parts, I used xpath. – cimbom Dec 14 '15 at 17:30
  • I'm not quiete sure what you mean with "getting the values without the square brackets". Once you apply your regex to the string, you get the matches and their groups. The group with index 1 will contain the first capture-group (i.e. "clon" or "dlat" etc.), the second capture group is the inner value without the square brackets. I've added a usage example with .NET, but it's the same concept for JavaScript or whatever language you are using. – Maximilian Gerhardt Dec 14 '15 at 17:37
  • I mean using your Regex I would get: Clat=[400441379], Clon=[-1335259914], Dlat=[0], Dlon=[0]. However, I woud like to have like this: Clat=400441379, Clon=-1335259914, Dlat=0, Dlon=0. – cimbom Dec 14 '15 at 17:39
  • What language are you using? You musn't just cast the matches you get from `Regex.Matches()` to a string, that destroys the whole sense of it. Inside the `foreach` loop above, you can e.g. then do `var joined = name + "=" + inner_value;` and get the string you want. Use the values of the capture group to make up the string you want. – Maximilian Gerhardt Dec 14 '15 at 17:43
  • Are you familiar with Logstash? As I know, it uses Perl syntax. – cimbom Dec 14 '15 at 17:48
  • No, but what possibilites do you have? You may also be able to use a substituion regex, which sort of transforms this output in the one you want to have. Just giving it one regex on which Logstash uses `.ToString()` on each `Match` for the regex is pretty bad, you have no good controll over that. Just good for selecting the data pieces, but you can't get rid of the `[]` without accessing the capture group's values and joining them together sothat it gives you the dataformat you want. – Maximilian Gerhardt Dec 14 '15 at 17:53
  • Ok then I will do it like you suggested. Afterwards I can do a mutate and get rid of the []. Thanks :) – cimbom Dec 14 '15 at 17:55