Python: how to convert pickled txt files into gpickles for networkx?

Question

After working with a number of different occurrences of the same graph G, I dumped them as txt files with pickle using this line:

pickling=pickle.dump(G,open('pickled_G.txt','w')) #Example for one single graph

Now, for purposes of further calculations, I want to load these graphs back into networkx by doing:

work_dir=raw_input('Working directory: ')
for i,file in enumerate(os.listdir(work_dir)):
    if file.endswith(".txt"):
       filename=os.path.abspath(file)
       F = nx.read_gpickle(filename) #Loading graph G back into Python and calling it F

EDIT I get this error: ImportError: No module named copy_reg, which points at the line where F=nx.read_gpickle(filename).

I assume the problem is that I have a bunch of txt files and I am trying to load them as if they were gpickle. If my take is correct, how could I convert my .txt files into .gpickle without altering the graph features? This would spare me re-running my simulations.

Try using full/absolute paths to the files. Confirm that `os.listdir(work_dir)` actually *does* point to the right directory. *"I assume the problem is that I have a bunch of txt files and I am trying to load them as if they were gpickle"* <-- I don't think this is the problem. — jDo, Apr 09 '16 at 11:47

jDo · Answer 1 · 2016-04-09T12:36:17.860

2

OP's 1st error (File not found)

Try using full/absolute paths to the files. Confirm that os.listdir(work_dir) actually does point to the right directory.

"I assume the problem is that I have a bunch of txt files and I am trying to load them as if they were gpickle" <-- I don't think this is the problem. The error occurs before this stage.

Run this to shed some light on what's going on:

import os

work_dir=raw_input('Working directory: ')
if os.path.isdir(work_dir):
    print "Directory exists:", work_dir
    for i,f in enumerate(os.listdir(work_dir)):
        if os.path.exists(f):
            if os.path.isfile(f):
                print "Found a file named:", f
            else:
                print "Found something else (dir) named:", f
        else:
            print "Invalid path within a valid work_dir:", f
else:
    print "Work_dir does not exist:", work_dir

OP's 2nd error (ImportError: No module named copy_reg)

This might be caused by the how the pickle files were written. Check this question and see if using ẁb (write binary) solves it:

file = open("test.txt", 'wb')
thing = {'a': 1, 'b':2}
cPickle.dump(thing, file)
file.close()

I would imagine that using rb (read binary) for reading wouldn't hurt either.

If you're loading pickle files on Linux that were written on Windows, you might have to do another trick mentioned in the other question:

dos2unix originalPickle.file outputPickle.file

edited Apr 09 '16 at 12:36

answered Apr 09 '16 at 12:01

jDo

3,962
1
11
30

I think I've found a brutal workaround. Since `filename` was pointing at a wrong folder, I wrote: `filename=work_dir+'\\'+file`. Many will not like it but now I get a different error in another section :) – FaCoffee Apr 09 '16 at 12:03
1

@FC84 What if you use the the `join` method of the `os` module instead? E.g. `file_path = os.path.join(work_dir, file)`? You can probably just use forward slashes instead of introducing more complexity by using - and escaping - backslashes. – jDo Apr 09 '16 at 12:05
1

@FC84 Yeah, doing it manually often works but then just fails to handle edge cases. `os.path.join` claims to [*"Join one or more path components intelligently"*](https://docs.python.org/2/library/os.path.html) meaning it does more than simply `full_path = path + filename`. I imagine it's OS aware, checks for existing slashes before adding more, adds slashes if they're not there and so on. – jDo Apr 09 '16 at 12:12
So this means that I have to re-run my simulations because the files to be "pickled" are the result of a computation process. Your suggested workaround seems to refer to when these files are created... – FaCoffee Apr 11 '16 at 09:48
Alternatively, how could I turn my txt files (which were saved as 'w') into files saved as 'wb'? – FaCoffee Apr 11 '16 at 10:02
Try determining if `wb` vs. `w` is the issue first. Try running [this little test](http://pastebin.com/raw/fy8ts5PH). If it fails to load the file written using `w`, you know what's causing the issue. Try dos2unix first and if it still fails, re-run the simulation and save using `wb`. Run a `diff` on the files like I'm doing in the test if you want to find the **diff**erences (if there are any). – jDo Apr 11 '16 at 10:28
I figured out what the mistake was. The files were written as tzt with `'w'` using `pickle.dump()`. I was trying to load them with `nx.read_gpickle()`, which calls for `.gpickle` files. What solved the issue was unpickling them using `pickle.load(open(filename,"r"))`. The unpickling was then successful. – FaCoffee Apr 11 '16 at 10:34
I also uploaded this comment as my own answer. I am not the most clever of users, but how cool is it to reply to yourself? :) Thank you for your support. – FaCoffee Apr 11 '16 at 10:44
1

@FC84 You're welcome :) There are lots of monologues like that on SO. It's nice when people do it so everybody can see what solved the issue; I'm sure you'll save lots of people lots of time in the future. Btw. I doubt that the file extension matters at all; the file contents is the important part. If you type `dir(networkx)` or `dir(nx)` to see all the module methods, you'll find networkx's own pickling methods in the list: `write_gpickle` and `read_gpickle`. I guess you need to `write_gpickle` to load using `read_gpickle` and, similarly, use `pickle.dump` to later load using `pickle.load`. – jDo Apr 11 '16 at 10:55
Yes, I made this mistake, e.g., not checking within Netwokx. However, after rectifying it this stuff seems to work :) thanks again! – FaCoffee Apr 11 '16 at 11:04

score 2 · Accepted Answer · answered Apr 11 '16 at 10:35

I figured out what the mistake was. The files were written as txt with 'w' using pickle.dump():

    pickling=pickle.dump(G,open(original_dir2+'\\pickling_test.txt','w')) 
#G is the graph from networkx, and original_dir is the dir where the txt files were dumped

I was trying to load them with nx.read_gpickle(), which calls for .gpickle files.

What solved the issue was unpickling the files using pickle.load(open(filename,"r")). The unpickling was then successful.

score 1 · Answer 3 · answered Apr 09 '16 at 11:49

1

The IOError suggests that the file you're referencing simply isn't there, not that it's being loaded incorrectly. Can you double check that you're running your script from the right folder, have the text files in the right place etc?

I'm also not familiar with os.path.basename, but it may be the way that you're referencing the file that's causing trouble?

answered Apr 09 '16 at 11:49

Ari Cooper-Davis

3,374
3
26
43

I've replaced `basename` with `os.path.abspath` but I keep having the same problem. The thing is: the files are where they are supposed to be, and the `work_dir` I assign is correct. Anyhoo, when the error pops up, it refers to a directory which I have never used in the script. How come? – FaCoffee Apr 09 '16 at 11:54
This is because `filename` points at a different folder. How come, if it is linked to `file`, which is in turn linked to `work_dir`, with the latter being doublechecked? – FaCoffee Apr 09 '16 at 11:57

Python: how to convert pickled txt files into gpickles for networkx?

3 Answers3