0

I know that there are tons of questions with issues related to this topic which is regex, but I've been trying to fill a requirement for an URL. The URL comes as follows:

POST /fr.synomia.search.ws.module.ModuleSearch/geResults/jsonp?xmlQuery=<?xml version='1.0' encoding='UTF-8'?><query ids="16914"><matchWord>avoir</matchWord><fullText><![CDATA[]]></fullText><quotedText><![CDATA[]]></quotedText><sensitivity></sensitivity><operator>AND</operator><offsetCooc>0</offsetCooc><cooc></cooc><collection>0</collection><searchOn>all</searchOn><nbResultDisplay>10</nbResultDisplay><nbResultatsParAspect>5</nbResultatsParAspect><nbCoocDisplay>8</nbCoocDisplay><offsetDisplay>0</offsetDisplay><sortBy>date</sortBy><dateAfter>0</dateAfter><dateBefore>0</dateBefore><ipClient>82.122.169.244</ipClient><typeQuery>0</typeQuery><equivToDelete></equivToDelete><allCooc>false</allCooc><versionDTD>3.0.5</versionDTD><r34>1tcbet30]</r34><mi>IND</mi></query>&callback=__gwt_jsonp__.P1.onSuccess&failureCallback=__gwt_jsonp__.P1.onFailure HTTP/1.1

It is an URL requested to a REST WS, in the structure of this url, we can find a tag: <query ids="16914">

I want to extract that number 16914 from the whole URL, the regex I tried to implement is the following:

private static Pattern p = Pattern.compile(
"<\\?xml version='1.0' encoding='[^']+'\\?><query ids=\"([0-9]+)\"><matchWord>.*");

I tried with some tools like Debuggex but I can't manage to find what could be the problem, I prefer to use regex instead of working with a lot of methods from the String class.

I would really appreciate any help. Thanks a lot in advance.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Marcelo Tataje
  • 3,849
  • 1
  • 26
  • 51

2 Answers2

1

I'd use SAX for this purpose:

public class XMLParser extends DefaultHandler {
   int id;
   public void startElement(String ns, String qName, String localName, Attributes attrs) throws SAXException {
     if (qName.equals("query")) { 
        id = Integer.parseInt(attrs.getValue("id"));
     }
   }
   public String toString() { 
     return String.format("%d", this.id); 
   }
   public static void main(String[] args) throws Exception {
     SAXParserFactory factory = SAXParserFactory.newInstance();
     SAXParser parser = factory.newSAXParser();
     XMLParser parserObj = new XMLParser();
     parser.parse(new FileReader(args[0], parserObj);
     System.out.println(parserObj);
  }
}
hd1
  • 33,938
  • 5
  • 80
  • 91
  • thanks for your response! It was really helpful, I would have tried with your SAX solution for this purpose but I'm not allowed, the requirement asked me to use regex. But thanks anyway, that was very useful. Best regards. – Marcelo Tataje May 28 '13 at 15:30
1

There is nothing wrong with your regex, it works for me.

String s = "POST /fr.synomia.search.ws.module.ModuleSearch/geResults/jsonp?xmlQuery=<?xml version='1.0' encoding='UTF-8'?><query ids=\"16914\"><matchWord>avoir</matchWord><fullText><![CDATA[]]></fullText><quotedText><![CDATA[]]></quotedText><sensitivity></sensitivity><operator>AND</operator><offsetCooc>0</offsetCooc><cooc></cooc><collection>0</collection><searchOn>all</searchOn><nbResultDisplay>10</nbResultDisplay><nbResultatsParAspect>5</nbResultatsParAspect><nbCoocDisplay>8</nbCoocDisplay><offsetDisplay>0</offsetDisplay><sortBy>date</sortBy><dateAfter>0</dateAfter><dateBefore>0</dateBefore><ipClient>82.122.169.244</ipClient><typeQuery>0</typeQuery><equivToDelete></equivToDelete><allCooc>false</allCooc><versionDTD>3.0.5</versionDTD><r34>1tcbet30]</r34><mi>IND</mi></query>&callback=__gwt_jsonp__.P1.onSuccess&failureCallback=__gwt_jsonp__.P1.onFailure HTTP/1.1";
Pattern p = Pattern.compile(
            "<\\?xml version='1.0' encoding='[^']+'\\?><query ids=\"([0-9]+)\"><matchWord>.*");

Matcher m = p.matcher(s);

if (m.find()) {
    System.out.println("Group: "+m.group(1));
}

Prints:

Group: 16914
melwil
  • 2,547
  • 1
  • 19
  • 34
  • Thanks, that's actually what I needed, I was using m = m.match() instead of m.find() that's why my code was not working, thanks a lot. Best regards. – Marcelo Tataje May 28 '13 at 15:31
  • m.match() requires the entire string to match, but m.find() allows partial matches. You can probably simplyfy the regex to only look for `` though. – melwil May 28 '13 at 15:34