1

In my script, I have two http requests. I would like reuse the connection, so for example what I do is:

curl -v 'http://example.com?id=1&key1=value1' 'http://example.com?id=1&key2=value2'

Is there any way to store the output of each http request in two different variables? I have been searching. I haven't found any solution yet.

I understand I can do the following to store output in two different files.

curl -v 'http://example.com?id=1&key1=value1' -o output1 'http://example.com?id=1&key2=value2' -o output2

Edit: here is my use case

I have a cronjob that runs the parallel (GNU parallel) command below every few minutes. And 'get_data.sh' will be run 2000 times, because there are 2000 rows in input.csv. I would like to avoid using tmp file to get the best performance.

parallel \
  -a input.csv \
  --jobs 0 \
  --timeout $parallel_timeout \
  "get_data.sh {}"

In get_data.sh:

id=$1
curl -v "http://example.com?id=${id}&key1=value1" -o output1 \
"http://example.com?id=${id}&key2=value2" -o output2

stat1=$(cat output1 | sed '' | cut ..)
stat2=$(cat output2 | awk '')
Jerry
  • 15
  • 6
  • What are you going to do next with the variables? – Mark Setchell Mar 23 '20 at 17:50
  • Using commands like sed, awk, cut to get the data I care about – Jerry Mar 23 '20 at 18:42
  • And how many hundred shell variables do you have? – Mark Setchell Mar 23 '20 at 18:52
  • The script 'get_data.sh' will be run 2000 times by parallel actually. I edited my question. I hope that explain my use case better. Please let me know if you need more info : ) – Jerry Mar 24 '20 at 00:12
  • It seems to me you are running maybe 8 or more processes per line of your file (`bash`, `curl`, `awk`, `sed`, `cat` etc) making 16,000+ processes. I can't help thinking you would be better off using **Python** and multi-threading. Failing that, write your temporary files in `/tmp` which is a RAM-based filesystem and should be faster. – Mark Setchell Mar 24 '20 at 08:49
  • @MarkSetchell Thank you! I will take Python into consideration. /tmp RAM-based filesystem definitely helps. I will use that for now. – Jerry Apr 09 '20 at 02:55

2 Answers2

2

You are looking for parset. It is part of env_parallel which is part of the GNU Parallel package (https://www.gnu.org/software/parallel/parset.html):

parset myarr \
  -a input.csv \
  --jobs 0 \
  --timeout $parallel_timeout \
  get_data.sh {}

echo "${myarr[3]}"

You can have parset run all combinations - just like you would with GNU Parallel:

echo www.google.com > input.txt
echo www.bing.com >> input.txt

# Search for both foo and bar on all sites
parset output curl https://{1}/?search={2} :::: input.txt ::: foo bar

echo "${output[1]}"
echo "${output[2]}"

If you are doing different processing for foo and bar you can make functions and run those:

# make all new functions, arrays, variables, and aliases defined after this
# available to env_parset
env_parallel --session

foofunc() {
  id="$1"
  curl -v "http://example.com?id=${id}&key1=value1" | sed '' | cut -f10
}

barfunc() {
  id="$1"
  curl -v "http://example.com?id=${id}&key2=value2" | awk '{print}'
}

# Run both foofunc and barfunc on all sites
env_parset output {1} {2} ::: foofunc barfunc :::: input.txt

echo "${output[1]}"
echo "${output[2]}"
env_parallel --end-session

--(end-)session and env_parset are needed if you do not want to export -f the functions and variables that you use in the functions.

GNU Parallel uses tempfiles. If your command runs fast then these tempfiles never touch the disk before they are deleted. Instead they stay in the disk cache in RAM. You can even force them to stay in RAM by pointing --tmpdir to a ramdisk:

mkdir /dev/shm/mydir
parset output --tmpdir /dev/shm/mydir ...
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Yeah, that's very helpful for me. I can store the output of get_data.sh into an array. Can you explain whether parset works for ```curl -v 'http://example.com?id=1&key1=value1' -o output1 'http://example.com?id=1&key2=value2' -o output2``` Instead of two tmp files output1 and output2, can I save the result in two variables or in an array? – Jerry Apr 09 '20 at 03:05
0

Ok, here is some inspiration:

id=$1
output1=$(curl -v "http://example.com?id=${id}&key1=value1")
output2=$(curl -v "http://example.com?id=${id}&key2=value2")

stat1=$(echo "$output1" | sed '' | cut ..)
stat2=$(echo "$output2" | awk '')

This way you avoid writing stuff to disk.

Jonas
  • 144
  • 3