0

I once wrote a Python script does basically scrapes a webpage and searches for a particular text and gives the value of the number of times the text appears in the web page.

Now, I want to incorporate the same as a web app.

My app will be taking two, string variables; a date (which I will split into day, month, and year) and a name.

The first variable will be used to generate the unique web URL (it's a date-based list) and this URL will be parsed to collect information using text search (if possible by using RegEx beyond simple Python search functions).

Now, I want to set up a webpage which will have two elements (for date and a name). I want these two variables to be run by the script and then the output must be generated in the web page (on the same page or on a new page).

Simple.

With my limited knowledge, I think both Flask and Django will be too heavy for this.

How do you think would I be able do it?

EDIT: Here's my code (that I essentially thought out and grabbed from different places.

# KATscrape is a script that Basil Ajith (https://twitter.com/basilajith) wrote way back 
# in 2016-2017 in order to search and parse the KAT
# cause lists. Now, it is being re-written to be hosted
# as a web application online.

# Parsing web page learnt from:
# https://stackoverflow.com/questions/25067580/passing-web-data-into-beautiful-soup-empty-list#25068054
# Developed by https://stackoverflow.com/users/2141635/padraic-cunningham

# Printing List without quotes learnt from:
# https://stackoverflow.com/questions/11178061/print-list-without-brackets-in-a-single-row#11178075
# Developed by https://stackoverflow.com/users/1172428/fatalerror
# Edited by https://stackoverflow.com/users/6451573/jean-fran%c3%a7ois-fabre

# Finding occurence of advocate's name in the cause list learnt from:
# https://stackoverflow.com/questions/17268958/finding-occurrences-of-a-word-in-a-string-in-python-3#17268979
# Developed by https://stackoverflow.com/users/148870/amber
# Dependencies
from sys import argv
from bs4 import BeautifulSoup
import requests
import re   # I don't know pandas (neither do I know RegEx much); but I think RegEx would serve our purpose.

filename, date, adv_name = argv


# Short Lists
court_numbers = ["1", "7", "8", "4"]

# The parser function
def katscrape():
    day = date[0:2]
    month = date[3:5]
    year = date[6:14]
    base_url = "http://keralaadministrativetribunal.gov.in/ciskat/pages/cause_list_home.php?type=search&dte=%s/%s/%s&court=%s"

    # Starting to parse
    for i in court_numbers:
        cl_current = base_url % (day, month, year, i)
        the_page = requests.get(cl_current)
        soup = BeautifulSoup(the_page.content, "lxml")

        da_stuff = str(soup)
        judges_list = ["Mr. Justice T.R. Ramachandran Nair", 
                    "Mr. V. Somasundaran", "Mr. V.Rajendran", "Mr. Rajesh Dewan", "Mr. Benny Gervacis"]
        
        sitting=[]
        for x in judges_list:
            if x in da_stuff:
                sitting.append(x)
        
        # Printing court number and presiding members.
        print("Court No. %s:" % i)
        print("Presiding: ", (", ".join(sitting)), " \n")

        # Checking for advocate's name in the cause list:
        count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(adv_name), da_stuff))
        print("%s has %d matters in this court.\n" % (adv_name, count))


print("Matters for %s on %s:" % (adv_name, date) + "\n")
katscrape()
Basil Ajith
  • 125
  • 10
  • 1
    Flask may actually be the perfect lightweight solution for this, as you can quickly spin up a prototype in minutes. Django I agree would be too heavy and opinionated to do this. Alternatively, if you know some Javascript, you could use Flask (API Backend) and React (JS Framework Frontend) to build something that's a bit more decoupled. Look up Flask Mega tutorial by Miguel Grinberg. – mmenschig Mar 19 '20 at 15:30
  • 1
    Thanks man, for your comment. Btw, where did you get that T-Shirt that you are wearing in your profile picture? Could I have the link to that? :-D – Basil Ajith Mar 19 '20 at 15:37
  • 1
    Unfortunately I don't have the tshirt anymore and I can't it online anymore. I do know it was sold by Atlassian though so maybe reach out to them and kindly ask? :) – mmenschig Mar 19 '20 at 15:43
  • 1
    I've left an answer for you below. I hope you understand and it works for you. – mmenschig Mar 29 '20 at 16:04

1 Answers1

1

not sure if you're still looking for an answer here but I want to give you a minimal working example

Imagine your project directory:

my_project
|-- app.py (your flask server)
|
|-- api
    |-- services
        `--my_script.py

my_script

import json
from datetime import datetime


def split_date(date_string):

    dt = datetime.strptime(date_string, '%Y-%m-%d')
    date_object = {
        "year": dt.year,
        "month": dt.month,
        "day": dt.day
    }

    return date_object

app.py (The flask backend)

from flask import Flask, request, jsonify

from api.services.my_script import split_date

app = Flask(__name__)

@app.route('/api/get-date-object', methods=['GET', 'POST'])
def get_date_object():
    if request.method == 'GET':
        return jsonify({ "error": True, "msg": "Must perform POST request to this endpoint"})

    if request.method == 'POST':
        request_json = request.get_json()

        if "date_string" in request_json:
            print(request_json)
            date_object = split_date(request_json["date_string"])
            print(date_object)
            return jsonify(date_object)
        else:
            return jsonify({ "error": True, "msg": "Must provide a date string in the format 'YYYY-MM-DD'."})


app.run(debug=True)

Now we test these api endpoints via curl. This is the step you'll want to do from the browser, either via plain HTML or a JS framework like Angular, React, or Vue.

Expecting error response, because we are performing a GET request:

λ curl -s -X GET http://localhost:5000/api/get-date-object
{
  "error": true,
  "msg": "Must perform POST request to this endpoint"
}

Expecting an error, because we are not passing a "date_string" key in the json object:

λ curl -s -X POST -H "Content-Type: application/json" -d "{\"name\": \"Graced Lamb\"}" http://localhost:5000/api/get-date-object
{
  "error": true,
  "msg": "Must provide a date string in the format 'YYYY-MM-DD'."
}

Expecting a success, and returning an object where we return individual components of a date provided as "YYYY-MM-DD":

λ curl -s -X POST -H "Content-Type: application/json" -d "{\"date_string\": \"2020-02-01\", \"name\": \"Graced Lamb\"}" http://localhost:5000/api/get-date-object
{
  "day": 1,
  "month": 2,
  "year": 2020
}

I had to escape the individual keys within the -d parameter, because I am on Windows this time. I hope this helps. If you'd like the code for this, let me know and I can create a public github repo for you to clone/download.

mmenschig
  • 1,088
  • 14
  • 22
  • Thank you bro. I don't know whether this will work for me. I need to study this and understand this. I am not a professional programmer. I am a self-taught hobbyist. I will let you know. I will paste my code above and I expect you will get a clearer understanding as to what I need. – Basil Ajith Apr 08 '20 at 13:57
  • 1
    Hi gracedlamb, I'm also a self taught programmer, there's no shame in this. Patience is a virtue, don't try to learn too many things at once, it's tempting and easy to fall into the trap. – mmenschig Apr 09 '20 at 14:22
  • mmenschig, where should the curl commands go into? In the app.py file? – Basil Ajith Apr 10 '20 at 19:37
  • Curl is a tool to send http requests. It's shipped on Unix and linux machines like macOS or CentOs. If you're on windows, I suggest downloading postman and running the http request from there. To answer the question, the curl command goes into your terminal. If youre on a Mac, try typing `man curl` to pull up the manual page – mmenschig Apr 10 '20 at 19:42
  • I have used curl before (for installing packages). I am on a Linux machine. I did not read the whole thing properly. I get it now. It was for testing, right. I had my initial question incomplete. The two variables would come from tags of a web page and are passed on as arguments to a Python script and the result of that script will be printed on a web page. – Basil Ajith Apr 11 '20 at 09:54