I'm trying to write a regex to scrape text for a jobs site I'm building. I'm relatively new to scraping (and coding) and have been using Parsehub to assist with the former. That's useful for scraping where a job element consistently matches an html element (eg job_title matches , same position, on a page). I can use Parsehub to scrape a relevant block of text but I'll need to use a regex to give Parsehub more direction when the info I need can only be distinguished in relation to other text.
I've spent hours trying to figure out the following. For example, I want to extract the deadline date from the following text:
To Apply
Deadline for applications is the 10th January 2021. Interviews will take place in the third week of January 2021.
I've written the regex:
/Deadline for applications is the\s([0-9a-zA-Z]\w*)\s([0-9a-zA-Z]\w*)\s([0-9a-zA-Z]\w*)
But how do I pull just the groups 1-3? If I add \1 or $1 for example, at the end, I get an error "regular expression does not match the subject string."
I have some way to go in learning here but if anyone has some pointers, they'd be much appreciated. Once I get the basic principles of the above, I'll be in a much better place.