2

I have the following String and I want to filter the MBRB1045T4G out with a regular expression in Java. How would I achieve that?

String:

<p class="ref">
<b>Mfr Part#:</b>
MBRB1045T4G<br>


<b>Technologie:</b>&nbsp;
    Tab Mount<br>



<b>Bauform:</b>&nbsp;
    D2PAK-3<br>



<b>Verpackungsart:</b>&nbsp;
    REEL<br>



<b>Standard Verpackungseinheit:</b>&nbsp;
    800<br>

Dominik
  • 4,718
  • 13
  • 44
  • 58
  • 5
    [by offering up your sanity to Cthulhu](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html) – Wrikken May 08 '12 at 16:38
  • iow, use an HTML parser. – Guillaume Polet May 08 '12 at 16:42
  • Please refrain from parsing HTML with RegEx as it will [drive you insane](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. – Madara's Ghost May 08 '12 at 17:00
  • What is your constraint? The second line after

    ? Something which starts at beginning of lińe with an uppercase letter? Something which ends with 4G? Something, 3 lines before Technologie?

    – user unknown May 08 '12 at 17:37
  • basically before the String I want there is the and then the
    ... so its STRING
    but there is a line break between the in the html, is that relevant?
    – Dominik May 08 '12 at 17:41

1 Answers1

3

As Wrikken correctly says, HTML can't be parsed correctly by regex in the general case. However it seems you're looking at an actual website and want to scrape some contents. In that case, assuming space elements and formatting in the HTML code don't change, you can use a regex like this:

 Mfr Part#:</b>([^<]+)<br>

And collect the first capture group like so (where string is your HTML):

Pattern pt = Pattern.compile("Mfr Part#:</b>\s+([^<]+)<br>",Pattern.MULTILINE);
Matcher m = pt.matcher(string); 
if (m.matches())
    System.out.println(m.group(1)); 
alexg
  • 3,015
  • 3
  • 23
  • 36
  • Pattern pt = Pattern.compile("Mfr Part#:([^<]+)
    "; ? How would I get the string?
    – Dominik May 08 '12 at 16:51
  • Matcher m = pt.matcher(string); if (m.matches()) System.out.println(m.group(1)); – alexg May 08 '12 at 16:53
  • Element desc = doc.select("p[class=ref]").first(); logger.debug("found ref:"+ desc.text()); Pattern pt = Pattern.compile("Mfr Part#:([^<]+)
    "); Matcher m = pt.matcher(desc.text()); if (m.matches()){ logger.debug("found partnumber:"+ m.group(1)); article.setManufacturerArticleNumber(m.group(1)); article.setDistributorArticleNumber(m.group(1)); }
    – Dominik May 08 '12 at 17:04