I have a proprietary content scheme from which I need to scrape ranges of custom HTML-like tags.
Two examples of these tags are:
<college-point image>/48e1255c8bd8d1c8a6c5d263f7130853.jpg</college-point>
<college-point podcast-episode>704097</college-point>
I had an expression (<college-point\s\w*>([^>]+)>
) which worked well for finding tags with one word tag values, like image
. When I added podcast-episode
I ran into trouble getting the hyphen supported.
I tried something like <college-point[\s\w*]([^>]+)>
, but this only returns me the opening of the tag, not the entire thing. What syntax should I be using to allow hyphenated tags?