How to ingest live data and make predictions on them in Flask

Question

I have a flask app that upon a user triggering an endpoint, it pings an external API, gets live data, transforms that information, and runs it through a linear regression model and outputs the prediction to the user.

I'd like to have it though where one internal function constantly retrieves the newest data from the live stream, transforms it, makes the predictions, and then whenever the user pings a separate endpoint with a request, it just return the most recent predictions that were made.

What is the best way to go about this? Currently my setup is something similar to this:

from flask import Flask, make_response, jsonify, request
import numpy as np
import pandas as pd

app = Flask(__name__)

def get_data(name):
    response = requests.get('https://pokeapi.co/api/v2/pokemon/{}'.format(name))
    data = response.json()
    return data

def transform(name,data):
    df = pd.DataFrame(columns=['skill_name','url','name'])
    for val in range(len(data['abilities'])):
        row = pd.Series(data['abilities'][val]['ability'])
        row.index = ['skill_name','url']
        row['name'] = name

        df = df.append(row,ignore_index=True)
    return df

def make_prediction(transformed_data):
    pass
    #some sklearn model

@app.route('/getinfo', methods=['POST'])
def get_info():
    name = int(request.args.get('name'))
    raw_data = get_data(name)
    transformed_df = transform(name,data)
    return make_prediction(transformed_df)

As you can see, the frequency of predictions is currently completely dependent on the frequency that the /getinfo endpoint is pinged.

But now imagine in a hypothetical world that every few seconds the information about a particular pokemon kept changing and I wanted to keep loading in the newest data, irrespective of the frequency of requests from an external user, and when the user does send a request, I have on hand the latest predictions ready. Kind of like a stream of predictions being generated internally, put into a stack object, and the latest one returned to the user off the top of the stack when a request is received. How would I go about doing this? The main problem I'm facing is how to have two functions running concurrently inside a Flask app - one that runs on an endless fixed interval loop to get new data and generate predictions, and another that fetches the latest prediction and returns it to the user only when it's called.

score 0 · Answer 1 · answered Dec 21 '21 at 18:58

Upon starting the application I would start another thread using the threading module which is set on a loop to request and process the new data every x seconds and then set a global variable with the new results.

You may need to save the new result to a mutex depending on whether Flask uses the multiprocessing module for each request.

Then when someone sends request to /get-info you can just return the value of the Global Variable.

How to ingest live data and make predictions on them in Flask

1 Answers1