3

need to run two python3 scripts simultaneously. The first script (app1.py) provides information to the second script (app2.py). Both scripts need to run together at the same time, ideally from a single script.

Script 1 is a bs4 based scraping script that runs infinitely in a loop without ever ending. Script 2 is a FLask web app that displays information from script 1. Is it possible to run Script 1 without importing it as this causes issues that stem from script 1 running in a infinite loop?

How do I run both scripts together from a single script?

arthem
  • 141
  • 3
  • 13

1 Answers1

3

Design

First, before adding complexity (particularly around concurrent programming) you should ask, do I really need to do this? Could the flask app trigger a new scrape on a request?

Concurency

When doing tasks which need to run next to each other in python there are three main ways to do this:

  1. multithreading
  2. multiprocessing
  3. asyncio

Processes are separate things as far as the operating system is concerned, and contain threads. asyncio is another way of thinking about this which allows you to forget about the OS.

Python has a feature called the Global Interpreter lock which basically means it can only interpret one line of bytecode at a time in a process. This means that if you application uses multithreading one thread will freeze whilst another does other things. It should be noted that this limit only applies to interpreting the bytecode, if there is IO intensive work like a flask server then you will probably find that there is enough time whilst the server is off doing stuff that you can still use multithreading.

Why go for multiprocessing?

Alot of work has been put into making the interface between multithreading and multiprocessing very similar, so it adds very little complexity and just to be sure you weren't clogging up your server it might be easiest just to use multiprocessing.

Why go for multithreading?

The down side with multiprocessing is that python has to pickle data between your processes as they can't share memory like threads can. This compared to multithreading is slow, however its still pretty fast for reasonable amounts of data. Remember "premature optimisation is the root of all evil", profile your code before and after optimising, to decide if it was worth it.

Why go for asyncio

asyncio was added to python with the aim "making writing explicitly asynchronous, concurrent Python code easier and more Pythonic.", some people would disagree. I think you are best of trying it and seeing if it works for you. From the sounds of your application it isn't large enough to really benefit from the massive concurrency that asyncio allows.

Personally I would choose multiprocessing for this kind of thing.

Imports

It is generally not desirable for import my_script_which_loops to hang for ever, instead you will often see something like the following:

# my_script_which_loops

def main():
    while True:
        print("I am scraping the thing!")

if __name__ == "__main__":
    main()

This means that if you run \> python my_script_which_loops.py then you will scrape the thing as intended, however if the script isn't the main script then importing it won't hang. Please see here for more info.

Edwin Shepherd
  • 413
  • 1
  • 8
  • 17
  • Nice writeup. My issue is that it keeps hanging. Currently using the command [os.system('python3 scraper.py')] but I doubt that its the best way to do things. Also could you clarify " however if the script isn't the main script then importing it won't hang" please? – arthem Jun 12 '20 at 19:43
  • 1
    The link provided provides a alot of depth but as an overview. `__name__` is a special variable which gets set to different things depending on where the file is. If you run `x.py` which has `import y` then `__name__` is set to `"__main__"` in `x.py`. `__name__` is `y` in `y.py`. It took me a little while to get my head around this. – Edwin Shepherd Jun 12 '20 at 19:49
  • 1
    You are correct that `os.system` is not the best way, have a look at some of the examples in the linked documentation, I may edit the anwser with a suggested way later if I have time. – Edwin Shepherd Jun 12 '20 at 19:51