2

I have created a flask application that open up Database connections and stores some data in global variables. The data in these global variables is used by subsequent ajax requests in the page. However I am running into serious issues with concurrency. I tried using both uwsgi and gunicorn to run the flask application(and got the same results with both). This has been my configuration in both the runs

1) 4 workers

2) multithreaded = True in flask.

When 2 users use the application(the data returned is specific to a few options that the user inputs), what happens is that sometimes data from what was requested by another user comes in my application instance and sometimes the reverse.

My hypothesis is that the worker from which my application gets its data keeps changing. I am not very sure about the worker model in gunicorn and uwsgi. Can someone tell me how I would ensure that the user gets only the data that he has requested for?(Reminder: the data that he is requesting for is stored in a global variable in python and on an ajax request, this object is passed to the html). Any help will be really appreciated.

I have read about request contexts but am completely lost with how to go about this

@app.route("/"):
def redir():
    global a;
    #assume this is only for post(from a ajax call)
    a = #some data built from a database based on the options from the page where the post was made
    return jsondumps({'data':a[0:100]});

@app.route("/next100")
def next100():
    global a;
    # return the next 100 records of the global variable a.
    return jsondumps({'data':a[100:200]});

What is expected is that a user makes the first ajax request to the redir() function and then on a different ajax call the next100() is called and it returns the data. The above happens without any issues when there is only one user.

When there are 2 users and they both have called redir() and when they keep calling next100() both the users randomly get data from the global "a" variable(sometimes from user1's context and sometimes from user2's)

Kakarot
  • 175
  • 1
  • 3
  • 10
  • Don't use global variables. If you post some code where your errors shows (minimal working example) it will get easier to help you. – syntonym Jun 21 '16 at 11:38
  • @syntonym I am not getting errors. The application works fine, just that the wrong global variable comes to me. Let me make it simpler(I will get working on a small snippet that could reproduce this) say there are a maximum of 5 options in a select list that a user can select in a page, these selected options are sent to python over an ajax request and some corresponding data to those selected variables is sent to the user(this object from which a subset is sent is global - I cannot make it local due to other constraints). when multiple ppl use, i see other's request instead of mine sometimes. – Kakarot Jun 21 '16 at 11:44

2 Answers2

2

If you're going to store something as a global in your web worker, you better make sure that the life cycle of that variable starts and ends with a request.

Else what you get is you store something from a request, then another request comes in possibly with a different context (e.g. different user), and then your data is all mixed up.

If something needs to be shared across your web workers, it may be better to go for a centralized datastore (DB, cache, anything), because then you will be forced to think and label each piece of data with a context. E.g. before you store any user-related data, you will think of labeling that row of data with user_id = X

You may think you could achieve the same thing with a Flask global with the user information attached, but that breaks down when you think about how a request may go to Web Worker 1, and then another request from the same user goes to Web Worker 2 where the previous data isn't present (the global is limited to Web Worker 1). This is a case where some centralized datastore shines.

bakkal
  • 54,350
  • 12
  • 131
  • 107
  • You can also store in the `flask.g` object which is realized as a cookie sent to the user. – syntonym Jun 21 '16 at 11:49
  • @bakkal I do not need to share anything at all across my web workers(but unintentionally the data is mixing up) – Kakarot Jun 21 '16 at 11:55
  • @syntonym I have posted a model code which explains my use case. – Kakarot Jun 21 '16 at 12:00
  • both of the above are checking if the incoming is a post request, sorry for missing that out in the snippet above. – Kakarot Jun 21 '16 at 12:01
  • how do i ensure that the life cycle of the variable ends with the end of a request? – Kakarot Jun 21 '16 at 12:03
  • What bakkal said still holds. Either store `a` in a database or in flask.g. – syntonym Jun 21 '16 at 12:03
  • @syntonym what exactly does holding it in flask.g do? and could you give me a small snippet for the above code with how it is to be done? sorry for troubling you, but I am very new to python and this has got me stuck for hours. – Kakarot Jun 21 '16 at 12:09
  • I am trying to clear out any misconceptions if any that I might be carrying around. Say i have 4 workers to serve my incoming requests and one user finished a request, when the next request by a different user is made, will it always allot a new worker to it? or would it just randomly choose from any of the workers and allot it to this request?(I am basically trying to look at a case where worker1 gets realloted to request2 and this messes up my data because the data of variable a from request1 is still left behind, like bakkal had mentioned) – Kakarot Jun 21 '16 at 12:18
  • If you make a request a random worker will get the request. Thus `a` can be from a totally differnt request you wanted. global variables should only be used for static things like configuration because they break in cases like yours. – syntonym Jun 21 '16 at 12:22
2

As bakkal pointed out global variables will break when using threads.

Instead you can give the user a cookie with the data you want to carry over:

session["a"] = "a" # or whatever

Of course this will not scale indefinitely so instead you should store some (database-) session information in the cookie and load the information from a databse, if you have lots of data.

For more information on flask.session read the documentation.

flask.g is actually to carry over information inside a single request from and to different functions, not from and to different requests in a single session.

syntonym
  • 7,134
  • 2
  • 32
  • 45
  • the global variable a in my case is of order of 500~600 MB on an average, so I would have to save it in a database and go with that approach :) Would you have to happen any link to a sample implementations of this by any chance? – Kakarot Jun 21 '16 at 12:27
  • Was able to clear my confusions :) will try implementing this now. Thank you! – Kakarot Jun 21 '16 at 12:29
  • 1
    Yes, cookies need to be send each time a request is done. So if you have a 600mb cookie (besides that no browser will probably allow that) each request had the user to upload that data, which probably needs at least an hour or so. You can do something like `session["db_id"] = 1234` and fetch that data by that id from the database. – syntonym Jun 21 '16 at 12:32
  • thanks for the tip :) One final question, how would I remove the data for a user once he has terminated the connection? – Kakarot Jun 21 '16 at 12:51
  • How exactly does a user "terminate the connection"? – syntonym Jun 21 '16 at 12:51
  • There would be no way for my application to know if a user has closed his tab right? – Kakarot Jun 21 '16 at 14:53
  • You would need javascript for that, but I'm not sure if it is "realiable". [Here](http://stackoverflow.com/questions/3888902/javascript-detect-browser-close-tab-close-browser) is a question in that direction. You could also make some keep alive "ping" in your website with javascript and when you don't get the ping for some time you could guess that the tab is closed. – syntonym Jun 21 '16 at 14:55