4

I am trying to get a list of URL after redirection using bash scripting. Say, google.com gets redirected to http://www.google.com with 301 status. What I have tried is:

json='[{"url":"google.com"},{"url":"microsoft.com"}]'

echo "$json" | jq -r '.[].url' | while read line; do
    curl -LSs -o /dev/null -w %{url_effective} $line 2>/dev/null
done

So, is it possible for us to use commands like curl inside jq for processing JSON objects. I want to add the resulting URL to existing JSON structure like:

[
  {
    "url": "google.com",
    "redirection": "http://www.google.com"
  },
  {
    "url": "microsoft.com",
    "redirection": "https://www.microsoft.com"
  }
]

Thank you in advance..!

oguz ismail
  • 1
  • 16
  • 47
  • 69
Srikanth Sharma
  • 1,509
  • 2
  • 15
  • 27

3 Answers3

4

curl is capable of making multiple transfers in a single process, and it can also read command line arguments from a file or stdin, so, you don't need a loop at all, just put that JSON into a file and run this:

jq -r '"-o /dev/null\nurl = \(.[].url)"' file |
curl -sSLK- -w'%{url_effective}\n' |
jq -R 'fromjson | map(. + {redirection: input})' file -

This way only 3 processes will be spawned for the whole task, instead of n + 2 where n is the number of URLs.

oguz ismail
  • 1
  • 16
  • 47
  • 69
  • 1
    PS: spawning the extra `curl` + `jq` process in my while loop solution is nothing compared to the io wait time involved in the network communication. But running multiple requests in parallel has obvious benefits (especially in combination with fewer processes) +1 – hek2mgl Aug 28 '19 at 15:37
2

I would generate a dictionary with jq per url and slurp those dictionaries into the final list with jq -s:

json='[{"url":"google.com"},{"url":"microsoft.com"}]'  

echo "$json" | jq -r '.[].url' | while read url; do
    redirect=$(curl -LSs \
                    -o /dev/null \
                    -w '%{url_effective}' \
                    "${url}" 2>/dev/null)
    jq --null-input --arg url "${url}" --arg redirect "${redirect}" \
        '{url:$url, redirect: $redirect}'
done | jq -s

Alternative (first) solution:

You can output the url and the effective_url as tab separated data and create the output json with jq:

json='[{"url":"google.com"},{"url":"microsoft.com"}]'

echo "$json" | jq -r '.[].url' | while read line; do
    prefix="${line}\t"
    curl -LSs -o /dev/null -w "${prefix}"'%{url_effective}'"\n" "$line" 2>/dev/null
done | jq -r --raw-input 'split("\t")|{"url":.[0],"redirection":.[1]}'

Both solutions will generate valid json, independently of whatever characters the url/effective_url might contain.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
1

Trying to keep this in JSON all the way is pretty cumbersome. I would simply try to make Bash construct a new valid JSON fragment inside the loop.

So in other words, if $url is the URL and $redirect is where it redirects to, you can do something like

printf '{"url": "%s", "redirection": "%s"}\n' "$url" "$redirect"

to produce JSON output from these strings. So tying it all together

jq -r '.[].url' <<<"$json" |
while read -r url; do
    printf '{"url:" "%s", "redirection": "%s"}\n' \
        "$url" "$(curl -LSs -o /dev/null -w '%{url_effective}' "$url")"
done |
jq -s

This is still pretty brittle; in particular, if either of the printf input strings could contain a literal double quote, that should properly be escaped.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • You don't advertise to construct the json manually from random input from the internet, right? :) – hek2mgl Aug 28 '19 at 10:16
  • 1
    I don't particularly like it but the options seem to be limited. See also e.g. https://stackoverflow.com/questions/43192556/using-jq-with-bash-to-run-command-for-each-object-in-array – tripleee Aug 28 '19 at 10:19
  • you could output tab separated strings and pipe that to `jq --raw-input` to build the output json – hek2mgl Aug 28 '19 at 10:22
  • Something like this? https://stackoverflow.com/questions/29663187/csv-to-json-using-jq doesn't exactly seem more straightforward. – tripleee Aug 28 '19 at 10:33
  • 1
    Added an answer. (Can be deleted if you want to adapt to it, just better to show than in comments) – hek2mgl Aug 28 '19 at 10:35
  • @srikhanth Your suggested edit introduced several errors, so I had to reject it; but thanks for the suggestion to switch to `jq -s` at the end. – tripleee Aug 28 '19 at 10:57
  • no probs. I just wanted to make sure that it runs well for others with mere copy paste, without any issues. thanks for ur change which works. – Srikanth Sharma Aug 30 '19 at 08:53