0

I am curl-ing webpage titles, some of them contain escaped characters (HEX code?), most likely coming from Wordpress esc_url() https://developer.wordpress.org/reference/functions/esc_url/

# title=$(curl -s $theurl | grep -o "<title>[^<]*" | tail -c+8)
title='Questions &#038; Answers'
touch $title'.txt'

How can I UN-escape or decode these back, without having to make a list and replace them

&#038; = Ampersand
&#8217; = Right Single Quotation Mark
&#x2d; = Hyphen-Minus
...
FFish
  • 10,964
  • 34
  • 95
  • 136
  • 1
    It's pretty hard in pure bash without a list, consider a one line perl: `echo "Questions & Answers" | perl -C -MHTML::Entities -pe 'decode_entities($_);'` – David Ranieri Jan 07 '23 at 17:54
  • Thanks! How come this does not work? `title='Questions & Answers' titleclean=$title | perl -C -MHTML::Entities -pe 'decode_entities($_);' print $titleclean` – FFish Jan 08 '23 at 05:09
  • You need to break the perl enclosures: `title='Questions & Answers' titleclean=$title | perl -MHTML::Entities -le 'print decode_entities("'"$titleclean"'")'` – David Ranieri Jan 08 '23 at 08:32
  • thanks again David, I also managed to get it with `titleclean="$(echo "$title" | perl -C -MHTML::Entities -pe 'decode_entities($_)')"` but I need to study the Perl options better. – FFish Jan 08 '23 at 10:21
  • your snippet `titleclean=$title | perl -MHTML::Entities -le 'print decode_entities("'"$titleclean"'")'`does seem to get me a new line though... – FFish Jan 08 '23 at 10:31
  • You forget the `title='Questions & Answers'` part – David Ranieri Jan 08 '23 at 17:50

0 Answers0