0

I want to fetch the URLs from the URL.txt and then append them to the end of base URL https://www.mcdelivery.com.pk/pk/browse/menu.html which is present in an another file menu.sh

Url.text file contains

?daypartId=1&catId=1
?daypartId=1&catId=2
?daypartId=1&catId=11
?daypartId=1&catId=10
?daypartId=1&catId=6
?daypartId=1&catId=4
?daypartId=1&catId=14
?daypartId=1&catId=5
?daypartId=1&catId=3
?daypartId=1&catId=8

I want to append url like https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=11 base url + url from URL.txt file

I have come up with this code but the problem is that I only get the price from the first page only and it keeps on repeating the value from the same page until the loops end.

ARRAY=()
while read -r LINE
do
ARRAY+=("$LINE")
done < URL.txt
for LINE in "${ARRAY[@]}"
do   
echo $LINE
curl https://www.mcdelivery.com.pk/pk/browse/menu.html$LINE | grep -o '<span class="starting-price">.*</span>' | sed 's/<[^>]\+>//g' >> price.txt 
done

Output that I am getting

Rs 398
Rs 487
Rs 841
Rs 752
Rs 398
Rs 398
Rs 487
Rs 841
Rs 752
....

I want to get the price from each of the page and store them into price.txt

TylerH
  • 20,799
  • 66
  • 75
  • 101
  • Probably, you only need to quote the URL. – Poshi Jun 07 '20 at 11:02
  • can you please tell me how? –  Jun 07 '20 at 11:06
  • Quoting = Write `"$LINE"` instead of `$LINE`, see also https://stackoverflow.com/q/29378566/6770384. ¶ However, I don't think that causes the problem »*`I only get the price from the first page`*« described by you. – Socowi Jun 07 '20 at 12:17
  • I cannot reproduce your problem. First of all, all URLs return the same page for me, no matter which catId I chose. And then, grepping these pages always returns `McArabia with Drink` and stuff like that, but never anything like `Rs`, `398`, `487`, or so on. – Socowi Jun 07 '20 at 12:27

2 Answers2

1

Please don't use regular expressions to parse html. Use a true html-parser / web-scraper like xidel instead.
In fact, there's no need for a Bash-script at all. xidel can do everything you want.

Parse html of "★What's New★"-menu-item and string-join price + product-name:

$ xidel -s "https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=1" -e '
  //div[ends-with(@class,"panel-product")]/join(
    (.//span[@class="starting-price"],.//h5),
    " - "
  )
'
Rs 288 - Cappuccino with Milk Chocolate Cookie
Rs 288 - Cappuccino with Double Chocolate Cookie
Rs 288 - Latte with Milk Chocolate Cookie
[...]
Rs 239 - Salted Caramel Shake

List all menu-items and string-join url + title:

$ xidel -s https://www.mcdelivery.com.pk/pk/browse/menu.html -e '
  //ul[@class="secondary-menu"]//a/join((resolve-uri(@href),span)," - ")
'
https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=12 - Deals
https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=1 - ★What's New★
https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=2 - Ala carte & Value Meals
[...]
https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=8 - Snack Time

For every menu-item, string-join url + title, open url / parse html and string-join price + product-name:

$ xidel -s https://www.mcdelivery.com.pk/pk/browse/menu.html -e '
  //ul[@class="secondary-menu"]//a/(
    join((resolve-uri(@href),span)," - "),
    doc(@href)//div[ends-with(@class,"panel-product")]/join(
      (.//span[@class="starting-price"],.//h5),
      " - "
    )
  )
'
https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=12 - Deals
Rs 487 - Grand Chicken Spicy with Drink
Rs 398 - Big Mac + Regular Drink
https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=1 - ★What's New★
Rs 288 - Cappuccino with Milk Chocolate Cookie
Rs 288 - Cappuccino with Double Chocolate Cookie
Rs 288 - Latte with Milk Chocolate Cookie
Rs 288 - Latte with Double Chocolate Cookie
Rs 159 - McFizz Guava
Rs 195 - Date Pie
Rs 416 - Spicy McCrispy Deluxe - Regular Meal
Rs 416 - McChicken - Regular Meal
Rs 239 - Curly Fries
Rs 239 - Salted Caramel Shake
https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=2 - Ala carte & Value Meals
Rs 257 - Chicken Burger with Cheese
Rs 265 - Value McArabia Chicken
Rs 265 - Mini McRoyale
[...]
https://www.mcdelivery.com.pk/pk/browse/menu.html?daypartId=1&catId=8 - Snack Time
Rs 301 - Spicy Chicken Burger
Rs 301 - 4pcs McNuggets
Rs 301 - Fries & Drink
Rs 150 - Apple Pie with Tea
Reino
  • 3,203
  • 1
  • 13
  • 21
0
#!/bin/bash
curl -sL https://www.mcdelivery.com.pk/pk/browse/menu.html | grep -o '<li class="secondary-menu-item ">.*</li>' | sed 's/href=/\nhref=/g' | \
grep 'href=\"' | \
sed 's/.*href="//g;s/".*//g' > URL.txt
sed -i 's/amp;//' URL.txt

ARRAY=()
while read -r LINE
do
    ARRAY+=("$LINE")
done < URL.txt

for LINE in "${ARRAY[@]}"
do    
    echo $LINE
    curl https://www.mcdelivery.com.pk/pk/browse/menu.html"$LINE" | grep -o '<h5 class="product-title">.*</h5>' | sed 's/<[^>]\+>//g' >> name.txt
    curl https://www.mcdelivery.com.pk/pk/browse/menu.html"$LINE" | grep -o '<span class="starting-price">.*</span>' | sed 's/<[^>]\+>//g' >> price.txt 
done    

TylerH
  • 20,799
  • 66
  • 75
  • 101