1

I am trying to write location's address to my excel sheet using xlswriter package in python. The problem I am facing is when I try to run the function using a thread the values are not written to the worksheet. See the below code: Here rows a list containing all the raw address.

import threading
import xlsxwriter
from geopy.geocoders import Nominatim

geolocator = Nominatim()
outputLocation = 'output.xlsx'
workbook = xlsxwriter.Workbook(outputLocation)
w_sheet4 = workbook.add_worksheet('openstreet')

def openstreet():
    p = 1
    for row in rows:        #rows consists of raw addresses

        try:
            rawAddress = str(" ".join(row[1].strip().split())) #here I am normalizing the addresses using openstreet api
            location = geolocator.geocode(rawAddress)
            w_sheet4.write(p, 1, rawAddress)
            w_sheet4.write(p, 2, str(location.address))
        except Exception as e:
            print "OpenStreet", e

        p += 1

t4 = threading.Thread(target=openstreet)
t4.start()

I am able to write to the worksheet if I don't use thread and run the function by calling it. See the below code:

import threading
import xlsxwriter
from geopy.geocoders import Nominatim

geolocator = Nominatim()
outputLocation = 'output.xlsx'
workbook = xlsxwriter.Workbook(outputLocation)
w_sheet4 = workbook.add_worksheet('openstreet')

def openstreet():
    p = 1
    for row in rows:        #rows consists of raw addresses

        try:
            rawAddress = str(" ".join(row[1].strip().split())) #here I am normalizing the addresses using openstreet api
            location = geolocator.geocode(rawAddress)
            w_sheet4.write(p, 1, rawAddress)  #raw address
            w_sheet4.write(p, 2, str(location.address)) #normalize address
        except Exception as e:
            print "OpenStreet", e

        p += 1

#t4 = threading.Thread(target=openstreet)
#t4.start()
openstreet()

The reason I am using thread because I am using multiple APIs (google, yahoo, openstreetmap API , bing) trying to normalize addresses and compare them. Could anyone help me why I am not able to write to the worksheet when I use thread instead of normal function calling.

python
  • 4,403
  • 13
  • 56
  • 103

2 Answers2

1

Use with caution! XlsxWriter is not thread safe. I discovered that if you use in_memory option it will use SharedStrings and eventually write to unexpected cells. I've created a ticket atm:

Github discussion

Update from XlsxWriter author:

The _get_shared_string_index() method in SharedStrings isn't thread safe. The other methods in that class should be since they are only called from Workbook (for which there should only be one unique instance per xlsxwriter object) or via the Workbook destructor (when the worksheet threads should be joined and activity stopped).

i've discovered that if i use in_memory option XlsxWriter will use SharedStrings

The SharedStrings class can be called regardless of whether you use in_memory or not. The in_memory option relates to the use of temporary files and not SharedStrings (see the docs).

If you wish to avoid using SharedStrings, and thus avoid locking, you can use the constant_memory mode. There are some caveats to that though: data needs to be written in row-column order and you can't use add_table() or merge_range(). See the Contructor docs (link above) and also Working with Memory and Performance.

And is there anything else in XlsxWriter that i need to worry about?

Possibly, but nothing that springs to mind.

UnstableFractal
  • 1,403
  • 2
  • 15
  • 29
0

I have solved it using threading.join() at the end of the function. Why we need to use join() can be understood from this link. The below code is working for me.

import threading
import xlsxwriter
from geopy.geocoders import Nominatim

geolocator = Nominatim()
outputLocation = 'output.xlsx'
workbook = xlsxwriter.Workbook(outputLocation)
w_sheet4 = workbook.add_worksheet('openstreet')

def openstreet():
    p = 1
    for row in rows:        #rows consists of raw addresses

        try:
            rawAddress = str(" ".join(row[1].strip().split())) #here I am normalizing the addresses using openstreet api
            location = geolocator.geocode(rawAddress)
            w_sheet4.write(p, 1, rawAddress)
            w_sheet4.write(p, 2, str(location.address))
        except Exception as e:
            print "OpenStreet", e

        p += 1

t4 = threading.Thread(target=openstreet)
t4.start()
t4.join() #we need to use threading.join to make sure the thread function finishes before the main program exits
Community
  • 1
  • 1
python
  • 4,403
  • 13
  • 56
  • 103
  • 1
    There's nothing in the code you've shown that explains why you need to call `t4.join()` since that should happen anyway since `t4` is a non-daemon thread. Note you're not actually calling the `join` method because there's no `()` at the end. Anyway, there must be other things going on in your real code not shown here. – martineau Aug 04 '15 at 18:17