1

I am doing web scraping with bash. I have these URLs saved in a file called URL.txt.

?daypartId=1&catId=1
?daypartId=1&catId=11
?daypartId=1&catId=2

I want to pass these URL to an array in another file main.sh which would append in the base URL https://www.mcdelivery.com.pk/pk/browse/menu.html**(append here)**. I want to append all the URl in URL.txt file in the end of the base URL one by one.

I have come up with the code to extract the URL from the URL.txt but it is unable to append it to the base URL one by one.

#!/bin/bash
ARRAY=()
while read -r LINE
do
    ARRAY+=("$LINE")
done < URL.txt

for LINE in "${ARRAY[@]}"
do    
    echo $LINE
    curl https://www.mcdelivery.com.pk/pk/browse/menu.html$LINE | grep -o '<span class="starting-price">.*</span>' | sed 's/<[^>]\+>//g' >> price.txt 
done  

Just need help with the loop so that i can append different URL in URL.txt file at the end of the base URL in the main.sh file.

mtnezm
  • 1,009
  • 1
  • 7
  • 19
  • 1
    Are you asking how to append a string and a variable in bash? Does [this post](https://stackoverflow.com/questions/4181703/how-to-concatenate-string-variables-in-bash) answer your question? – that other guy Jun 06 '20 at 18:35
  • No, I actually want to append the URL from another file to the end of the base url so that it can navigate to the website and fetch the tags that I am giving it. –  Jun 07 '20 at 07:34
  • 1
    `ARRAY=() while read -r LINE do ARRAY+=("$LINE") done < URL.txt for LINE in "${ARRAY[@]}" do echo $LINE curl https://www.mcdelivery.com.pk/pk/browse/menu.html$LINE | grep -o '
    .*
    ' | sed 's/<[^>]\+>//g' >> price.txt done` I have come up with this code but the output repeats itself like it only gives the output of the main page can you please spot the error?
    –  Jun 07 '20 at 09:11
  • 1
    @alecxs when I try your code it gives an error in the URL variable `line 14: https://www.mcdelivery.com.pk/pk/browse/menu.html: No such file or directory` what am I doing wrong here? –  Jun 07 '20 at 09:21
  • 1
    @alecxs I have multiple URLs in the array and I am appending the URL to the base URL from the array in the loop. Looking forward towards your answer –  Jun 07 '20 at 10:30
  • your code is working fine to me, just add `[ "$LINE" ] && curl` (skip empty lines in URL.txt) – alecxs Jun 07 '20 at 10:47
  • @alecxs my code is giving only the out from the one page and its repeating the same output ```Rs 398 Rs 487 Rs 841 Rs 752 Rs 398 Rs 398 Rs 487 Rs 841 Rs 752 Rs 398``` –  Jun 07 '20 at 11:10
  • Does this answer your question? [Reading input files by line using read command in shell scripting skips last line](https://stackoverflow.com/questions/17268113/reading-input-files-by-line-using-read-command-in-shell-scripting-skips-last-lin) – alecxs Jun 07 '20 at 11:13
  • I have remove the sed command to see if the output differs but the out remains the same the addition is just with the html tag. I used sed to remove the html tags –  Jun 07 '20 at 11:26
  • @alecxs I have updated the code in my question please review. I am really stuck at this problem. The out keeps on repeating itself. –  Jun 07 '20 at 12:01
  • Can you provide an example of the expected output? – mtnezm Jun 07 '20 at 13:17
  • The problem is solved the there was an error in the URLs. Thanks Everyone!! –  Jun 07 '20 at 14:50

1 Answers1

0

regarding your grep | sed can't help because don't know expected output

this is example to demonstrate why URL is passed to curl without appending URI

#!/bin/bash

# just for demo
> URI.txt
URI='?daypartId=1&amp;catId='
URL=https://www.mcdelivery.com.pk/pk/browse/menu.html

# just for demo
for id in 1 11 2
  do
    echo -e "${URI}${id}" | tee -a URI.txt
    # reason why it fails
    echo -e "\n\n\n" >> URI.txt
done

ARRAY=()
while read -r LINE || [[ -n $LINE ]]
do
    ## how to prevent
    #[ "$LINE" ] && \
    ARRAY+=("$LINE")
done < URI.txt

for LINE in "${ARRAY[@]}"
  do
    # just for demo
    echo -e "LINE='$LINE'"
    # skipt empty lines
    [ "$LINE" ] && curl "${URL}${LINE}" | grep -o '<span class="starting-price">.*</span>' | sed 's/<[^>]\+>//g' >> price.txt 
done

exit 0
alecxs
  • 701
  • 8
  • 17