I have a fairly large python package that interacts synchronously with a third party API server and carries out various operations with the server. Additionally, I am now also starting to collect some of the data for future analysis by pickling the JSON responses. After profiling several serialisation/database methods, using pickle was the fastest in my case. My basic pseudo-code is:
While True:
do_existing_api_stuff()...
# additional data pickling
data = {'info': []} # there are multiple keys in real version!
if pickle_file_exists:
data = unpickle_file()
data['info'].append(new_data)
pickle_data(data)
if len(data['info']) >= 100: # file size limited for read/write speed
create_new_pickle_file()
# intensive section...
# move files from "wip" (Work In Progress) dir to "complete"
if number_of_pickle_files >= 100:
compress_pickle_files() # with lzma
move_compressed_files_to_another_dir()
My main issue is that the compressing and moving of the files takes several seconds to complete and is therefore slowing my main loop. What is the easiest way to call these functions in a non-blocking way without any major modifications to my existing code? I do not need any return from the function, however it will raise an error if anything fails. Another "nice to have" would be for the pickle.dump() to also be non-blocking. Again, I am not interested in the return beyond "did it raise an error?". I am aware that unpickle/append/re-pickle every loop is not particularly efficient, however it does avoid data loss when the api drops out due to connection issues, server errors, etc.
I have zero knowledge on threading, multiprocessing, asyncio, etc and after much searching, I am currently more confused than I was 2 days ago!
FYI, all of the file related functions are in a separate module/class, so that could be made asynchronous if necessary.
EDIT: There may be multiple calls to the above functions, so I guess some sort of queuing will be required?