1

I'm wgeting a webpage src code then using pup to grab the <meta> tag that I need. Now I want to print only the value of the content field.

In this case, the output I want is: https://example.com/my/folder/first.jpg?foo=bar

# wget page to /tmp/output.html
IMAGE_URL=$(cat /tmp/output.html | pup 'meta[property*="og:image"]')
echo $IMAGE_URL is:
<meta property="og:image" content="https://example.com/my/folder/first.jpg?foo=bar">
John Smith
  • 31
  • 1
  • 5

2 Answers2

0
wget -O /tmp/output.html --user-agent="user-agent: Whatever..." https://example.com/somewhere
IMAGE_URL=$(cat /tmp/output.html | pup --plain 'meta[property*="og:image"]' | sed -n 's/.*content=\"\([^"]*\)".*/\1/p')
John Smith
  • 31
  • 1
  • 5
0

You can use attr{content} to only get the content of the attribute.

wget -O /tmp/output.html --user-agent="user-agent: Whatever..." https://example.com/somewhere
IMAGE_URL=$(cat /tmp/output.html | pup 'meta[property*="og:image"] attr{content}'

Antoine
  • 3,880
  • 2
  • 26
  • 44