1

I am scraping a website with curl and parsing out what I need.

The URLs are returned with Ascii encoded characters like

GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1

How can I convert this to UTF-8 (char) directly from the command line (ideally something I can pipe | to) so that the result is...

GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1

EDIT: There are a number of solutions with sed but the regex that goes along with it is quite ugly. Since the provided answer leveraging perl is very clean I hope we can leave this question open

Goldfish
  • 576
  • 1
  • 7
  • 22

1 Answers1

3

It's .

Decode like this using :

$ echo 'http://domain.tld/?fields={fieldname_of_type_Tab&#125' |
    perl -MHTML::Entities -pe 'decode_entities($_)' 

Output :

http://domain.tld/?fields={fieldname_of_type_Tab}
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • 1
    Wow, this is the most elegant solution I have come across for a command line conversion. Bonus for the fact it uses standard libraries – Goldfish Feb 26 '18 at 23:38