0

I need to have a regex to parse the below string:

{ "<div class="highlighttitle2">UNSPSC 43211701</div>" }

The whole string is option. The output I need is

UNSPC: 43211701

Please help.

I have tried..

.*?((?(?=ul).*?(?(?=div)|.*?\bUNSPSC\b.*?(?'UNSPSC'[^<]*)</div>)|.*?(?(?=div).*?\bUNSPSC\b.*?(?'UNSPSC'[^<]*)</div>|))|).*?((?(?=ul).*?(?(?=div)|.*?\bUNSPSC\b.*?(?'UNSPSC'[^<]*)</div>)|.*?(?(?=div).*?\bUNSPSC\b.*?(?'UNSPSC'[^<]*)</div>|))|)
Jeff
  • 13,943
  • 11
  • 55
  • 103
Anand Vyas
  • 131
  • 1
  • 9
  • 1
    What do you mean with **The whole string is option.**. Have you already tried any `regex`. Which language? – Nemesis Nov 14 '14 at 17:19
  • the string is part of the a big test string. I have tried couple of them like <.*?<.*?((?(?=ul).*?(?(?=div)|.*?\bUNSPSC\b.*?(?'UNSPSC'[^<]*))|.*?(?(?=div).*?\bUNSPSC\b.*?(?'UNSPSC'[^<]*)|))|).*?((?(?=ul).*?(?(?=div)|.*?\bUNSPSC\b.*?(?'UNSPSC'[^<]*))|.*?(?(?=div).*?\bUNSPSC\b.*?(?'UNSPSC'[^<]*)|)) – Anand Vyas Nov 14 '14 at 17:20
  • JavaScript is the language. – Anand Vyas Nov 14 '14 at 17:21
  • 1
    Okay, best is to add both (your actual regex and the language) into the text of the question. Also, all `JavaScript` to the tags. – Nemesis Nov 14 '14 at 17:22
  • 3
    Obligatory: You should not be using regex to parse HTML. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Jeff Nov 14 '14 at 17:27
  • 1
    I'm not sure your regex needs to be that complicated, but it's hard to tell for sure without knowing the possible values your are trying to extract. Is UNSPSC always going to be constant? Is 43211701 always going to be the same length? Always numbers? Any extra information would be helpful. – Wet Noodles Nov 14 '14 at 17:27
  • 1
    Which tags do you want to extract the text of? – mtanti Nov 14 '14 at 20:18
  • 1
    Please verify that you need the colon (:) if your output, as that is not part of your test string. Also, please provide your entire test string, or at least enough to see what pitfalls we need to avoid. – kayleeFrye_onDeck Nov 14 '14 at 22:18

2 Answers2

1

If you can guarantee that the string is always going to start with UNSPC and it is followed by numbers with no whitespaces, then your regex could be

(UNSPC \d*)

And your result, UNSPC 43211701, will be in the first capture group.

Jeff
  • 13,943
  • 11
  • 55
  • 103
1

This will give back as few matches as possible (probably what you're looking for)

(UNSPSC\s\d+?(?=<))

It won't care how many digits there are but will give you only one match instead of a match per digit.

kayleeFrye_onDeck
  • 6,648
  • 5
  • 69
  • 80