0

I am trying to pass a website as a parameter. It works if the website does not have a "/" in it. For example: http://192.168.1.156:2434/www.cookinglight.com scrapes cooking light for all the images on it's page; however, if I pass in http://192.168.1.156:2434/https://www.cookinglight.com/recipes/chicken-apple-butternut-squash-soup then an I get an invalid response. Here is my current code:

import json
from flask import Flask, render_template

from imagescraper import image_scraper

app = Flask(__name__)

@app.route("/", methods = ['GET'])
def home():
    return render_template('index.html')

@app.route("/<site>", methods = ['GET'])
def get_image(site):
    return json.dumps(image_scraper(site))


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=2434, debug=True)
import requests
from bs4 import BeautifulSoup


def image_scraper(site):
    """scrapes user inputed url for all images on a website and
    :param http url ex. https://www.cookinglight.com
    :return dictionary key:alt text; value: source link"""
    search = site.strip()
    search = search.replace(' ', '+')

    website = 'https://' + search
    response = requests.get(website)

    soup = BeautifulSoup(response.text, 'html.parser')
    img_tags = soup.find_all('img')
    # create dictionary to add image alt tag and source link
    images = {}
    for img in img_tags:
        try:
            name = img['alt']
            link = img['src']
            images[name] = link
        except:
            pass
    return images

I tried urrllib but did not have any success. Any help would be greatly appreciated! I am a student so still learning!!

UPDATE:

I believe this is the issue as described in the stackoverflow post

Need to allow encoded slashes on Apache

jaclynpgh
  • 1
  • 2
  • 1
    URL-encode the website URL. – Barmar Nov 04 '21 at 16:06
  • if you send url with `/` then it try to find routing like `route("///")` and this makes problem - you have to convert `/` to codes `%hex`. OR send it as `/?site=your_url` and then get it as `request.args["site"]` – furas Nov 04 '21 at 17:01
  • if you get `invalid response` then show this response in question (not in comments) as text. We will not run code to see `invalid response` and we can't read in your mind. – furas Nov 04 '21 at 17:03
  • always put full error message (starting at word "Traceback") in question (not comment) as text (not screenshot, not link to external portal). There are other useful information. – furas Nov 04 '21 at 17:03
  • This is error: Not Found The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again. – jaclynpgh Nov 04 '21 at 17:28
  • And when inspected: www.cookinglight.com:1 Failed to load resource: the server responded with a status of 404 (NOT FOUND) – jaclynpgh Nov 04 '21 at 17:31
  • There is no traceback error, it just says its not found like above but when i run it with out / for example www.cookinglight.com then it parses correctly but if i put in https://www.cookinglight.com/recipes/salted-caramel-apple-pie then i get that error – jaclynpgh Nov 04 '21 at 17:42
  • as I said - if you run with `/` then it tries to find `route("///")` and it can't find it. You would have to use different `route()` to catch it. `` can means only `www.domain.com` and url with `/` doesn't match to `` – furas Nov 04 '21 at 20:43
  • it is NOT issue - it standard behaviour in `Flask`. In `Flask` char `/` has special meaning in URL - to separate arguments like `///`. And your link show problem with `Apache` but you run `Flask` without `Apache`. You HAVE TO use `path:` to treat `/` as normal char. – furas Nov 04 '21 at 22:24

1 Answers1

0

Flask uses / as separate between arguments in url - so you can create route("/<arg1>/<arg2>/<arg3>") (or popular in blogs route("/<year>/<month>/<day>")) and you can get values in variables arg1, arg2, arg3 - and when you try to use your url with / then it also treat it as "/<arg1>/<arg2>/<arg3>" and it tries to find route like route("/<arg1>/<arg2>/<arg3>") and it can't find it and it gives error 404.

route("/<site>") can match only string without /. site is only variable name - it doesn't mean that it will treat it as url with /

If you want to use / as part of single argument, not as separator between arguments, then you need <path:site>.

from flask import Flask

app = Flask(__name__)

@app.route("/")
def home():
    return "Hello World"

@app.route("/<path:site>")
def get_image(site):
    return f"OK: {site}"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=2434)#, debug=True)

See also Variable Rules


EDIT:

It has nothing to do with issue. Flask was specially created to use / as special char to separate values.

furas
  • 134,197
  • 12
  • 106
  • 148
  • Thanks, but I think this is underlying issue: https://stackoverflow.com/questions/4390436/need-to-allow-encoded-slashes-on-apache – jaclynpgh Nov 04 '21 at 22:01
  • it is not underlaying issue but flask was created to use `/` as separator between arguments. – furas Nov 04 '21 at 22:15
  • If you use `https://www.cookinglight.com/recipes/chicken-apple-butternut-squash-soup` then `flask` treats it as 5 arguments `"http:"`, `""`, `www.cookinglight.com`, `recipes`, `chicken-apple-butternut-squash-soup` - and it doesn't match to `/` which expects only one argument without `/`. And `flask` has `path:` to get it as single argument with all `/` – furas Nov 04 '21 at 22:19