I have the following list
schema:
multiple_years_data = [
{
'dates': ... (list of dates for one year),
'values': ... (list of values for one year),
'buffer': output
},
{
'dates': ... (list of dates for another year),
'values': ... (list of values for another year),
'buffer': output
},
... (and so on)
]
All the data from above i want to insert it into an excel file with two columns(date
, value
) & for each dict
from the list i want to create a new workspace.
For example, if i would have 5 dicts in the list, i would have 5 workspaces, with data depending on the dict.
Having this much data, doing all the insert synchronously
takes some time, so i thought about implementing multiprocessing
so that i can add some processing power from other cores. Now if i would have data for 5 years, i would create 5 processes. From the start i tried it only in one process & one dict, and if it would work i will add more processes.
By the way, i should mention that i use the Django
framework.
So these are the most important parts of my code:
def add_one_year_to_excel(data):
workbook = xlsxwriter.Workbook(data['output'])
worksheet = data['buffer'].add_worksheet('first_worksheet')
# insert those 2 lists of data(`dates`, `values`) in two columns
worksheet.write_column(1, 0, data['dates'])
worksheet.write_column(1, 1, data['values'])
return workbook
output = BytesIO()
workbook = xlsxwriter.Workbook(output)
pool = Pool(1)
result = pool.apply(add_one_year_to_excel, multiple_years_data)
pool.close()
# this line makes so that i can automatically download the result file if i access the assigned url
response = HttpResponse(result.read(),
content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
response['Content-Disposition'] = f'attachment; filename=example.xlsx'
response['Access-Control-Expose-Headers'] = 'Content-Disposition'
return response
I know some parts are written a little bad, but i tried to do a very simple schema of my code so you can understand my problem.
Now the problem is that, once the result from function add_one_year_to_excel(data)
is returned it gets problematic:
MaybeEncodingError at /api/v1/forecasted_profile_excel/
Error sending result: '<xlsxwriter.workbook.Workbook object at 0x7f5874306dc0>'. Reason: 'PicklingError("Can't pickle <class 'xlsxwriter.worksheet.Number'>: attribute lookup Number on xlsxwriter.worksheet failed")'
In the end i understood that only BytesIO() can be pickled and if i add a worksheet, the object can't be pickled anymore. So, all the problem spins around pickling, at returning and at sharing data between multiple processes. I've been struggling for 3 days with this problem. Looked throughout the internet and still i can't a way to make this work.
So i'm asking you if you can help me get over this problem ? Maybe a better way of doing this but the same result ? I'm kinda new in this area so maybe my code isn't well written.
Thanks in advance.