0

I am doing web scraping with bash. I have these URL which is saved in a file URL.txt

?daypartId=1&catId=1
?daypartId=1&catId=11
?daypartId=1&catId=2

I want to pass these URL to an array in another file which would append in the base URL https://www.mcdelivery.com.pk/pk/browse/menu.html I want to append all the URl in URL.txt file in the end of the base url one by one.

  • 1
    Show us what you tried so far – Gilles Quénot Jun 06 '20 at 01:17
  • you may want to look into using python for this – Mike Q Jun 06 '20 at 03:07
  • while IFS= read -r line ;do echo $line done < "${text1.txt}" # while [ $text1.txt -lt 2 ] # do curl https://www.mcdelivery.com.pk/pk/browse/menu.html${line} | grep -o '.*' | sed 's/<[^>]\+>//g' >> 123.txt –  Jun 06 '20 at 11:49
  • I want a loop that can iterate over the url one by one and fetch the data through curl. –  Jun 06 '20 at 11:50
  • 1
    you shouldn't open duplicate questions https://stackoverflow.com/questions/62235280/get-data-from-one-file-to-another-bash-web-scraping – Sorin Jun 06 '20 at 18:23
  • Does this answer your question? [Get data from one file to another (Bash) - Web Scraping](https://stackoverflow.com/questions/62235280/get-data-from-one-file-to-another-bash-web-scraping) – Sorin Jun 06 '20 at 18:23
  • No, I want to loop over the URL and append it to the base URL. I just want a loop to iterate over the urls and append them in the base URL –  Jun 07 '20 at 07:36
  • `ARRAY=() while read -r LINE do ARRAY+=("$LINE") done < URL.txt for LINE in "${ARRAY[@]}" do echo $LINE curl https://www.mcdelivery.com.pk/pk/browse/menu.html$LINE | grep -o '
    .*
    ' | sed 's/<[^>]\+>//g' >> price.txt done` I have come up with this code but the output repeats itself like it only gives the output of the main page can you please spot the error?
    –  Jun 07 '20 at 09:12

1 Answers1

1

You will need a way to read each line,

while IFS= read -r line ;do
        echo $line
done < "${file}"

Then inside of that file reading loop you will need to perform the operation to append and use the $line you have gotten .

curl http://example.com${line}
Mike Q
  • 6,716
  • 5
  • 55
  • 62
  • while IFS= read -r line ;do echo $line done < "${text1.txt}" while [ $text1.txt -lt 2 ] do curl https://www.mcdelivery.com.pk/pk/browse/menu.html${line} | grep -o '.*' | sed 's/<[^>]\+>//g' >> 123.txt –  Jun 06 '20 at 11:47
  • I am trying to do this. I am new to bash. Can you help me with the array to loop over the iteration so that we can append them infront of the url. Thanks –  Jun 06 '20 at 11:48
  • I am trying to loop over the content that are in text1.txt. –  Jun 06 '20 at 15:19
  • `ARRAY=() while read -r LINE do ARRAY+=("$LINE") done < URL.txt for LINE in "${ARRAY[@]}" do echo $LINE curl https://www.mcdelivery.com.pk/pk/browse/menu.html$LINE | grep -o '
    .*
    ' | sed 's/<[^>]\+>//g' >> price.txt done` I have come up with this code but the output repeats itself like it only gives the output of the main page can you please spot the error?
    –  Jun 07 '20 at 09:12
  • try it like this and just print out the output to the screen and see what is going on, I think there may be an issue with your sed or grep but haven't checked.. typeset url="https://www.mcdelivery.com.pk/pk/browse/menu.html" while IFS= read -r line ;do curl "${url}${line}" | grep -o '
    .*
    ' | sed 's/<[^>]\+>//g' done < "URL.txt"
    – Mike Q Jun 10 '20 at 14:44