3

I have a site I am trying to grab data from, and the content is laid out like this:

 <p uri="/someRandomURL.p1" class="">TestData TestData TestData</p> 
 <p uri="/someRandomURL.p2" class="">TestData1 TestData1 TestData1</p>

I am using Java to grab the webpage's content, and am trying to parse through it like this:

        Pattern p = Pattern.compile(".*?p1' class=''>(.*?)<.*");
        Matcher m = p.matcher(data);

        //Print out regex groups to console
        System.out.println(m.group(1)) ;

But then an exception is thrown saying there is no match found...

Is my regex right? What else could possibly be going on? I am getting the html ok, but apparently there is no match for my regex...

Thanks

Stephen J.
  • 3,127
  • 4
  • 20
  • 28
  • 1
    I'll just leave this here... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Justin Garrick Apr 14 '11 at 19:55

1 Answers1

0

If the text elements contain multiple text lines, then it wouldn't find a match, because the dot (.) doesn't match \n (by default).

Give this a try:

 Pattern p = Pattern.compile(".*?p1' class=''>(.*?)<.*", Pattern.DOTALL);
Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268