0

I do several request to a database about shortest paths, very often the same ones (76 iterations, and for each iteration 10'000 to 20'000 requests; and I run this several times with small differences but mostly the same origins and destinations of the shortest path). So I created a cache. My cache is a dictionary.

    with open('path_to_OD_dict', 'rb') as file:
        my_unpickler=pickle.Unpickler(file)
        OD_dict = my_unpickler.load()

Origins and destinations are coded with a 5-digit number. I fill them with zeros if necessary, and then put them together (we have origin_route < destination_route):

    origin_route = str(origin_route).zfill(5)
    destination_route = str(destination_route).zfill(5)
    key = origin_route + destination_route

Then I simply try to find this key I just built in my dictionary. If it exists, I have directly my value.

    try:
        result = OD_dict[key]

Otherwise, I request the shortest path and add it in the dictionary:

    except:
        result = shortestPathLength(origin_route, destination_route)
        OD_dict[key] = result
        with open('path_to_OD_dict', 'wb') as file:
            my_pickler=pickle.Pickler(file)
            my_pickler.dump(OD_dict)

At the line OD_dict[key] = result I have the mistake TypeError: 'str' object does not support item assignment.

I'm also using Multiprocessing. It is very likely that I'm just trying to access OD_dict with several processes at the same time.

results = []
po = multiprocessing.Pool()
for item in aSet:
    results.append(po.apply_async(aBiggerFunctionWithShortestPathInIt))                    
po.close()
po.join()
for r in results:
    myResult.add(r.get())

Why do I get this particular message? (TypeError)

Plus, how should I adapt my code to be able to read OD_dict with several processes at the same time (or at least without creating a problem), and modify it if necessary sequentially, without modifying the dump simultaneously?

Antonin
  • 1,748
  • 7
  • 19
  • 24
  • 2
    NB there's [`functools.lru_cache`](http://docs.python.org/3/library/functools.html#functools.lru_cache) in 3.2+ which does this just fine. –  Nov 10 '12 at 00:57
  • @delnan That's an awesome addition to functools, didn't know that was there. – Gareth Latty Nov 10 '12 at 00:59
  • 2
    Have you tried printing ``OD_dict`` there? What do you get? – Gareth Latty Nov 10 '12 at 01:00
  • The problem happens the first time this line is called. Key=2150130059, so 2 5-digit numbers together, it's OK, and the result of the shortest path is 114.716549999. If I try to to a `for key in dict: print OD_dict[key]`, I receive a `string indices must be integers, not str`... – Antonin Nov 10 '12 at 01:10
  • 1
    Try printing type(OD_dict) ... From the errors you are writing, I can assume that, OD_dict is a string not a dictionary. As you can't alter a specific character in a Python String .. nor can it have not-int indices when you call specific element in it. – Mahmoud Aladdin Nov 10 '12 at 01:19
  • You're right: the type is str. It's weird, it was working before, as a dictionary. And suddenly when I undump it, it's a str... – Antonin Nov 10 '12 at 01:25
  • I haven't dealt with a pickle before .. but I noticed the Unpickler has a function `load_dict()` .. maybe this is what you want. – Mahmoud Aladdin Nov 10 '12 at 01:29
  • 3
    Change your `except:` to `except KeyError:` because it's probably masking some other exception and is your real problem...which most likely has something to do with `OD_dict` not being what you think it is. – martineau Nov 10 '12 at 02:27
  • Also, I would suggest using the `shelve` module for this rather than `pickle` -- the latter being designed precisely for creating persistent dictionaries. – martineau Nov 10 '12 at 02:41
  • @martineau: I didn't try to implement `shelve`, but I'm not sure it's a solution. I'm running this script both on my personal computer (Mac OS) and on a server (Linux), and I read [here](http://stackoverflow.com/questions/8704728/using-python-shelve-cross-platform) that `shelve` is not really adapted for this. – Antonin Nov 10 '12 at 09:33
  • For some reason it seems to be related to the fact that I was using [`Multiprocessing`](http://docs.python.org/2/library/multiprocessing.html). Is it possible that the fact I accessed the OD_dict at the same time with different processes would generate this error? – Antonin Nov 10 '12 at 10:36
  • 1
    This is perhaps stylistic, but it's already causing you problems so yeah. try /except blocks are for catching exceptional circumstances. Using them wholesale will mask problems and confuse you. There is nothing wrong with `if key in OD_dict: return OD_dict[key] else: rest of code` – aychedee Nov 10 '12 at 11:21
  • why do you need a cache? how slow is your shortest path calculation and what are you using for that? – Karussell Nov 10 '12 at 12:28
  • I'm doing a request to a postgresql database for shortest path. I'm doing mostly the same requests again and again, so I was thinking a dictionary with distance for each OD pair would be more efficient. If a build it step by step, it should do it. To give you an idea, I do around 100 iterations of my code, each iteration does between 10'000 and 20'000 request. And I rerun my code several times to test different things, but with almost the same requests. – Antonin Nov 10 '12 at 14:54
  • @Antonin: Re: `shelve` vs `pickle` -- do you really need the cache to be cross-platform? Also don't confuse your testing needs with those of the finished product. – martineau Nov 10 '12 at 16:40
  • @Antonin: Re: `Multiprocessing` -- using it absolutely could cause the `OD_dict` backing files to get corrupted. – martineau Nov 10 '12 at 16:45
  • @Antonin: The answer to the question [Elegant way to store dictionary permanently?](http://stackoverflow.com/questions/11821322/python-elegant-way-to-store-dictionary-permanently) might be a good cross-platform way to store the cache dictionary. – martineau Nov 10 '12 at 18:34

0 Answers0