Exchanging data between Python and React in real time with Electron JS

Question

I have to implement a web scraping tool and I choose to do with electron js with react and python.

I could able to integrate python and react using python shell in electron as follow,

React Code

import React from 'react';
var path = require("path")

const {PythonShell} = require("python-shell");
const city = 'XYZ';
  var options = {
    scriptPath : path.join(__dirname, '../../python/'),
    args : [city]
  }

class App extends React.Component {
constructor(props) {
  super(props);
}


componentDidMount() {
  var shell = new PythonShell('main.py', options); //executes python script on python3

  shell.on('message', function(message) {
    console.log('message', message)
  })
}


render (){
  return (
   <div className="header">
        <h1>Hello, World, {this.state.test}</h1>
   </div>
  )
 }
}

export default App;

Python Code

import sys
import requests
from bs4 import BeautifulSoup
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor
from urllib.parse import urljoin, urlparse

city = sys.argv[1]

class MultiThreadScraper:

def __init__(self, base_url):

    self.base_url = base_url
    self.root_url = '{}://{}'.format(urlparse(self.base_url).scheme, urlparse(self.base_url).netloc)
    self.pool = ThreadPoolExecutor(max_workers=5)
    self.scraped_pages = set([])
    self.to_crawl = Queue()
    self.to_crawl.put(self.base_url)

def parse_links(self, html):
    soup = BeautifulSoup(html, 'html.parser')
    links = soup.find_all('a', href=True)
    for link in links:
        url = link['href']
        if url.startswith('/') or url.startswith(self.root_url):
            url = urljoin(self.root_url, url)
            if url not in self.scraped_pages:
                self.to_crawl.put(url)

def scrape_info(self, html):
    return

def post_scrape_callback(self, res):
    result = res.result()
    if result and result.status_code == 200:
        self.parse_links(result.text)
        self.scrape_info(result.text)

def scrape_page(self, url):
    try:
        res = requests.get(url, timeout=(3, 30))
        return res
    except requests.RequestException:
        return

def run_scraper(self):
    while True:
        try:
            target_url = self.to_crawl.get(timeout=60)
            if target_url not in self.scraped_pages:
                print("Scraping URL: {}".format(target_url))
                self.scraped_pages.add(target_url)
                job = self.pool.submit(self.scrape_page, target_url)
                job.add_done_callback(self.post_scrape_callback)
        except Empty:
            return
        except Exception as e:
            print(e)
            continue
if __name__ == '__main__':
   s = MultiThreadScraper("http://websosite.com")
   s.run_scraper()

I could get all scraping URLs after executing the python shell in React but I want all URLs in real time in React front end.

Following React code execute the python code and give the final result

var shell = new PythonShell('main.py', options); //executes python script on python3

This React code is used to receives a message from the python script with a simple 'print' statement.

pyshell.on('message', function (message) {
 console.log(message);

});

Is there any way to get the result in real time while executing the python code?

Mahesh Reddy · Answer 1 · 2020-08-13T12:49:37.597

0

Use sys.stdout.flush() after print statement in python.

Ref:usage of sys.stdout.flush()

edited Aug 13 '20 at 12:49

answered Jul 07 '20 at 11:40

Mahesh Reddy

1
1

Exchanging data between Python and React in real time with Electron JS

1 Answers1