0

I need to extract the part in the href only once. However I also need to specify that it must match only hrefs where the following text is: Launch Information Processing Workflow,

<a class="wcmListViewLink" target="_blank" href="getContent?objectStoreName=Nice&vsId=%7BE6B85994-9B93-4A3C-878A-C7BBBA39BAD8%7D&objectType=document&folderId=%7BB51627F8-D74C-4607-ADD7-AC9C125D67F9%7D">Launch Information Processing Workflow</a>

The following reged worked:

href="(.+?)%7D"

How can I make it more specific and require the Launch Information Processing Workflow text piece?

dda
  • 6,030
  • 2
  • 25
  • 34
DMC
  • 219
  • 2
  • 15

2 Answers2

2

You forget to add .* to match any characters between " and >Launch.
(E.g: href="link" class="btn">Launch)

This one should works :

/href="(.+?)\".*?>Launch/

You can check it here : https://regex101.com/r/rN0tI5/2

Louis Barranqueiro
  • 10,058
  • 6
  • 42
  • 52
0

It is not very recommended to parse HTML using regular expressions, consider using XPath Extractor configured as follows:

  • Reference Name: any reasonable variable name
  • XPath Expression: //a[text()='Launch Information Processing Workflow']/@href
  • Check Use Tidy box just in case of response is not XHTML-compliant
Community
  • 1
  • 1
Dmitri T
  • 159,985
  • 5
  • 83
  • 133