I am using screaming frog to scrape youtube video keywords. I know the software displays a tab that captures exactly that meta info but it only shows 160 characters, so videos with a bigger volume of keywords do not show there.
Anyway, I also tried using CSS selectors and Xpath through the custom extraction feature on the software, but did not get anything.
The last thing I can think of is using a regex in the custom extraction to capture and extract the keywords straight from the html page.
This is the part where the keywords appear:
<meta property="og:video:tag" content="lanshow">
<meta property="og:video:tag" content="lanshow ep04">
<meta property="og:video:tag" content="lanshow episodio 4">
<meta property="og:video:tag" content="lanshow 4">
<meta property="og:video:tag" content="directo unboxme">
<meta property="og:video:tag" content="directo tecnologia">
<meta property="og:video:tag" content="directo hardware">
<meta property="og:video:tag" content="directo preguntas y respuestas">
<meta property="og:video:tag" content="preguntas y respuestas unboxme">
They also appear enumerated one after another further down like so:
"keywords":"lanshow,lanshow ep04,lanshow episodio 4,lanshow 4,directo unboxme,directo tecnologia,directo hardware,directo preguntas y respuestas,preguntas y respuestas unboxme","c":"WEB","player_response":"{\"videoDetails\":{\"thumbnail\":{\"thumbnails...
Is there a way to capture only the keywords, using regex, capture groups or something of the sort?
I have tried different regex combinations but I get the whole text and even the whole remaining text of the html appears on the extraction.
This gests only the first keyword: video:tag"content=.*?>
I also tried another regex that extracted the whole html text after the first keyword. I need to find a way to tell the extractor to find the before and after delimiters and ignore them on the extraction to get only what is in between (the actual keywords).
This is the before delimiter:
This is the after delimiter: ">
Is there a way to do that?
Thank you.