0

I have the following list schema:

multiple_years_data = [
    {
        'dates': ... (list of dates for one year),
        'values': ... (list of values for one year),
        'buffer': output
    },
    {
        'dates': ... (list of dates for another year),
        'values': ... (list of values for another year),
        'buffer': output
    },
    ... (and so on)
]

All the data from above i want to insert it into an excel file with two columns(date, value) & for each dict from the list i want to create a new workspace.

For example, if i would have 5 dicts in the list, i would have 5 workspaces, with data depending on the dict.

Having this much data, doing all the insert synchronously takes some time, so i thought about implementing multiprocessing so that i can add some processing power from other cores. Now if i would have data for 5 years, i would create 5 processes. From the start i tried it only in one process & one dict, and if it would work i will add more processes.

By the way, i should mention that i use the Django framework.

So these are the most important parts of my code:

def add_one_year_to_excel(data):
    workbook = xlsxwriter.Workbook(data['output'])
    worksheet = data['buffer'].add_worksheet('first_worksheet')

    # insert those 2 lists of data(`dates`, `values`) in two columns
    worksheet.write_column(1, 0, data['dates'])
    worksheet.write_column(1, 1, data['values'])

    return workbook


output = BytesIO()
workbook = xlsxwriter.Workbook(output)

pool = Pool(1)
result = pool.apply(add_one_year_to_excel, multiple_years_data)
pool.close()

# this line makes so that i can automatically download the result file if i access the assigned url
response = HttpResponse(result.read(),
                                content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
response['Content-Disposition'] = f'attachment; filename=example.xlsx'
response['Access-Control-Expose-Headers'] = 'Content-Disposition'

return response

I know some parts are written a little bad, but i tried to do a very simple schema of my code so you can understand my problem. Now the problem is that, once the result from function add_one_year_to_excel(data) is returned it gets problematic:

MaybeEncodingError at /api/v1/forecasted_profile_excel/
Error sending result: '<xlsxwriter.workbook.Workbook object at 0x7f5874306dc0>'. Reason: 'PicklingError("Can't pickle <class 'xlsxwriter.worksheet.Number'>: attribute lookup Number on xlsxwriter.worksheet failed")'

In the end i understood that only BytesIO() can be pickled and if i add a worksheet, the object can't be pickled anymore. So, all the problem spins around pickling, at returning and at sharing data between multiple processes. I've been struggling for 3 days with this problem. Looked throughout the internet and still i can't a way to make this work.

So i'm asking you if you can help me get over this problem ? Maybe a better way of doing this but the same result ? I'm kinda new in this area so maybe my code isn't well written.

Thanks in advance.

  • This might be relevant: https://stackoverflow.com/q/4677012/11301900. Is that the entire error message? Can you make a [mcve]? – AMC Jan 16 '20 at 15:58
  • Also, it appears that [`Pool.apply()`](https://docs.python.org/3.8/library/multiprocessing.html#multiprocessing.pool.Pool.apply) is blocking. – AMC Jan 16 '20 at 16:06
  • @AMC So here is the [error](https://repl.it/repls/JuvenileClearTechnologies) reproduction. – Darks NemeSySs Jan 16 '20 at 16:20
  • Did you get the code from [here](https://github.com/jmcnamara/XlsxWriter/issues/424)? – AMC Jan 16 '20 at 16:29
  • @AMC No. Overall i explored a lot of code and ended with a version like this. – Darks NemeSySs Jan 16 '20 at 16:32

0 Answers0