Bash: decode string with url escaped hex codes

Asked Jan 07 '23 at 15:34

Active Jan 08 '23 at 04:42

Viewed 49 times

I am curl-ing webpage titles, some of them contain escaped characters (HEX code?), most likely coming from Wordpress esc_url() https://developer.wordpress.org/reference/functions/esc_url/

# title=$(curl -s $theurl | grep -o "<title>[^<]*" | tail -c+8)
title='Questions &#038; Answers'
touch $title'.txt'

How can I UN-escape or decode these back, without having to make a list and replace them

&#038; = Ampersand
&#8217; = Right Single Quotation Mark
&#x2d; = Hyphen-Minus
...

edited Jan 08 '23 at 04:42

asked Jan 07 '23 at 15:34

FFish

1

It's pretty hard in pure bash without a list, consider a one line perl: `echo "Questions & Answers" | perl -C -MHTML::Entities -pe 'decode_entities($_);'` – David Ranieri Jan 07 '23 at 17:54
Thanks! How come this does not work? `title='Questions & Answers' titleclean=$title | perl -C -MHTML::Entities -pe 'decode_entities($_);' print $titleclean` – FFish Jan 08 '23 at 05:09
You need to break the perl enclosures: `title='Questions & Answers' titleclean=$title | perl -MHTML::Entities -le 'print decode_entities("'"$titleclean"'")'` – David Ranieri Jan 08 '23 at 08:32
thanks again David, I also managed to get it with `titleclean="$(echo "$title" | perl -C -MHTML::Entities -pe 'decode_entities($_)')"` but I need to study the Perl options better. – FFish Jan 08 '23 at 10:21
your snippet `titleclean=$title | perl -MHTML::Entities -le 'print decode_entities("'"$titleclean"'")'`does seem to get me a new line though... – FFish Jan 08 '23 at 10:31
You forget the `title='Questions & Answers'` part – David Ranieri Jan 08 '23 at 17:50

0 Answers0