0

I want to get the source link of the first image that appears in the Bing image search results for a specified search term.

I am currently using this command, but get no output:

curl -s "https://www.bing.com/images/search?q=cat&form=HDRSC2&first=1&tsc=ImageBasicHover" | grep -o '<a class="thumb" target="_blank" href="[^"]*'

Running only curl -s "https://www.bing.com/images/search?q=cat&form=HDRSC2&first=1&tsc=ImageBasicHover" displays HTML code of the page.

What am I doing wrong?

Robin
  • 3
  • 4

1 Answers1

0

should generally avoid parsing HTML with regex, which bobnice explains better than I can, here: https://stackoverflow.com/a/1732454/1067003

for example PHP can parse HTML with its DOMDocument API:

curl -s 'https://www.bing.com/images/search?q=cat&form=HDRSC2&first=1&tsc=ImageBasicHover' | php -r '$html = stream_get_contents(STDIN);$domd=new DOMDocument();@$domd->loadHTML($html);$xp = new DOMXPath($domd);var_dump($xp->query("//a[@data-hookid='\''pgdom'\'']")->item(0)->getAttribute("href"));'

prints

string(34) "https://pxhere.com/en/photo/609263"

the source of the first image.

hanshenrik
  • 19,904
  • 4
  • 43
  • 89
  • Thanks for your answer, @hanshenrik. Thanks to your command I get the link of the page where the picture is, but how to get the link to that picture for example, website.com/images/cat.jpeg? – Robin Jun 21 '23 at 21:48
  • @Robin maybe try the XPath ```//a[contains(@class,'iusc')]//img``` – hanshenrik Jun 22 '23 at 09:22
  • I get empty string with that command unfortunately. `curl -s 'https://www.bing.com/images/search?q=cat&form=HDRSC2&first=1&tsc=ImageBasicHover' | php -r '$html = stream_get_contents(STDIN);$domd=new DOMDocument();@$domd->loadHTML($html);$xp = new DOMXPath($domd);var_dump($xp->query("//a[contains(@class,'iusc')]//img")->item(0)->getAttribute("href"));'` – Robin Jun 22 '23 at 12:34
  • Thank you very much @hanshenrik and have a nice day! – Robin Jun 22 '23 at 15:01