Web Scraping with bash

Question

I am doing web scraping with bash. I have these URL which is saved in a file URL.txt

?daypartId=1&amp;catId=1
?daypartId=1&amp;catId=11
?daypartId=1&amp;catId=2

I want to pass these URL to an array in another file which would append in the base URL https://www.mcdelivery.com.pk/pk/browse/menu.html I want to append all the URl in URL.txt file in the end of the base url one by one.

while IFS= read -r line ;do echo $line done < "${text1.txt}" # while [ $text1.txt -lt 2 ] # do curl https://www.mcdelivery.com.pk/pk/browse/menu.html${line} | grep -o '.*' | sed 's/<[^>]\+>//g' >> 123.txt — , Jun 06 '20 at 11:49
I want a loop that can iterate over the url one by one and fetch the data through curl. — , Jun 06 '20 at 11:50
you shouldn't open duplicate questions https://stackoverflow.com/questions/62235280/get-data-from-one-file-to-another-bash-web-scraping — Sorin, Jun 06 '20 at 18:23
Does this answer your question? [Get data from one file to another (Bash) - Web Scraping](https://stackoverflow.com/questions/62235280/get-data-from-one-file-to-another-bash-web-scraping) — Sorin, Jun 06 '20 at 18:23
No, I want to loop over the URL and append it to the base URL. I just want a loop to iterate over the urls and append them in the base URL — , Jun 07 '20 at 07:36
`ARRAY=() while read -r LINE do ARRAY+=("$LINE") done < URL.txt for LINE in "${ARRAY[@]}" do echo $LINE curl https://www.mcdelivery.com.pk/pk/browse/menu.html$LINE | grep -o '
.*
' | sed 's/<[^>]\+>//g' >> price.txt done` I have come up with this code but the output repeats itself like it only gives the output of the main page can you please spot the error? — , Jun 07 '20 at 09:12

score 1 · Accepted Answer · answered Jun 06 '20 at 03:07

1

You will need a way to read each line,

while IFS= read -r line ;do
        echo $line
done < "${file}"

Then inside of that file reading loop you will need to perform the operation to append and use the $line you have gotten .

curl http://example.com${line}

answered Jun 06 '20 at 03:07

Mike Q

6,716
5
55
62

while IFS= read -r line ;do echo $line done < "${text1.txt}" while [ $text1.txt -lt 2 ] do curl https://www.mcdelivery.com.pk/pk/browse/menu.html${line} | grep -o '.*' | sed 's/<[^>]\+>//g' >> 123.txt – Jun 06 '20 at 11:47
I am trying to do this. I am new to bash. Can you help me with the array to loop over the iteration so that we can append them infront of the url. Thanks – Jun 06 '20 at 11:48
I am trying to loop over the content that are in text1.txt. – Jun 06 '20 at 15:19
`ARRAY=() while read -r LINE do ARRAY+=("$LINE") done < URL.txt for LINE in "${ARRAY[@]}" do echo $LINE curl https://www.mcdelivery.com.pk/pk/browse/menu.html$LINE | grep -o '
.*
' | sed 's/<[^>]\+>//g' >> price.txt done` I have come up with this code but the output repeats itself like it only gives the output of the main page can you please spot the error? – Jun 07 '20 at 09:12
try it like this and just print out the output to the screen and see what is going on, I think there may be an issue with your sed or grep but haven't checked.. typeset url="https://www.mcdelivery.com.pk/pk/browse/menu.html" while IFS= read -r line ;do curl "${url}${line}" | grep -o '
.*
' | sed 's/<[^>]\+>//g' done < "URL.txt" – Mike Q Jun 10 '20 at 14:44

Web Scraping with bash

.*

1 Answers1

.*

.*