I'm trying to get an article ID from a webpage, Im currently using xpath. But I don't know anything about regex, I think REGEX is the solution.
This is an example of the original code:
<div id="description" class="panel" style="display: block; overflow: hidden;"><h2 class="open">Descripción</h2><div class="productDescription">NEUROBION INYECTABLE X 3 AMPOLLAS<br><br>Modo de Uso: Vía Intramuscular<br>Componente Activo:Vitamina B1 (Tiamina) 100 Mg, Vitamina B6 (Piridoxina) 100 Mg Y Vitamina B12 (Cianocobalamina) 1 Mg. Solucion Inyectable Con Tecnología Doble Camara. <br>INVIMA 2015M-13939-R2<br><img class="" data-src="/arquivos/RX.png?v=636054173313030000" src="/arquivos/RX.png?v=636054173313030000"></div></div>
These are 2 examples that I got from screaming frog using xpath:
<div class="productDescription">NEUROBION 100MG/150MG CAJA X 30 TABLETAS<br><br>Modo de Uso: Vía Oral<br>Componente Activo: Vitamina B1 (Tiamina) Y Vitamina B6 (Piridoxina)<br>INVIMA 2019M-0009578-R1</div>
<div class="productDescription">NEUROBION INYECTABLE X 3 AMPOLLAS<br><br>Modo de Uso: Vía Intramuscular<br>Componente Activo:Vitamina B1 (Tiamina) 100 Mg, Vitamina B6 (Piridoxina) 100 Mg Y Vitamina B12 (Cianocobalamina) 1 Mg. Solucion Inyectable Con Tecnología Doble Camara. <br>INVIMA 2015M-13939-R2<br><img class="" data-src="/arquivos/RX.png?v=636054173313030000" src="/arquivos/RX.png?v=636054173313030000"></div>
But I just want to get this:
INVIMA 2015M-13939-R2
INVIMA 2019M-0009578-R1
This is what I already have done with xpath
//div[@id="description"]//div
Can somebody help me with the Regex Formula?
I also tried with this:
["'](INVIMA .*?)["']
)[\w -]+(?=<\/?(?:div|br)>)` See Demo: https://regex101.com/r/8QxasW/1 – Alireza Aug 11 '21 at 19:09