0

Update

It's throwing the error with a direct call now, and in the relations docs. I got rid of write_el() altogether and just do this:

...
if el["doc_type"] == "node":
    with codecs.open((fo_pre+"_nodes.json"), mode) as fo:
        fo.write(json.dumps(el, indent=2)+"\n")
...

Also, it should be noted that the XML document (OSM) has all the node elements first, followed by the way elements, then the relation elements.

Original post

I'm writing to multiple JSON files from within Python xml.etree.ElemenTree.iterparse, using codecs.open and json.dumps. I call a separate function to write to the file.

This works for some of the elements/documents, but not all. It writes only so many then stops with PermissionError: [Errno 13] Permission denied: <file name>. The last call to the file write method returns 207, but so do many of the previous calls. And, the next element looks normal:

<!--Last element written to JSON file.-->
  <node id="7898832843" lat="48.7888301" lon="-122.5067978" version="1" timestamp="2020-09-11T22:37:30Z" changeset="90779671" uid="10612244" user="mapstuffs"/>

<!--Next element, not written to JSON file.-->
  <node id="7898832844" lat="48.7888177" lon="-122.5058429" version="1" timestamp="2020-09-11T22:37:30Z" changeset="90779671" uid="10612244" user="mapstuffs"/>

Plus, it throws the error at a different element each time I try. And, it sometimes doesn't throw the error.

Stripped down Python:

import xml.etree.ElementTree as ET
import codecs
import json

def write_el(el, file_out, mode = "a"):
    with codecs.open(file_out, mode) as fo:
        fo.write(json.dumps(el, indent=2)+"\n")
    return

def process_map(file_in, fo_pre, mode = "a"):
    
    for _, element in ET.iterparse(file_in):
        # shape_element() formats XML elements into JSON-compatible Python
        # dictionaries and lists.
        el = shape_element(element)
        if el:
            if el["doc_type"] == "node":
# Calling open/write directly works.
#                 with codecs.open(fo_pre+"_nodes.json", mode) as fo:
#                     fo.write(json.dumps(el, indent=2)+"\n")
# But, calling write_el for this doc_type throws permission error
# halfway through the document. The element following the last written looks
# just fine.
                write_el(el=el, file_out=fo_pre+"_nodes.json", mode=mode)
# Calling write_el works fine for the other doc_types, if error not thrown
# from previous block first.
            elif el["doc_type"] == "way":
                write_el(el=el, file_out=fo_pre+"_ways.json", mode=mode)
            elif el["doc_type"] == "relation":
                write_el(el=el, file_out=(fo_pre+"_relations.json"),
                         mode=mode, write=write, pretty=pretty)
                
def test():
    process_map(file_in=filename, fo_pre="test_bham", write=True)

    return

test()

Returns

PermissionError: [Errno 13] Permission denied: 'test_bham_nodes.json'
Kaleb Coberly
  • 420
  • 1
  • 4
  • 19
  • Because you don't have the permission to access that file. You should check the permissions of the file and/or directory on the OS level. – zvone Nov 20 '20 at 20:46
  • The problem is that I do have permissions, since I am able to create it and write to it up until a certain point, within the same runtime of the same script. – Kaleb Coberly Nov 20 '20 at 20:59
  • Could my permissions change? – Kaleb Coberly Nov 20 '20 at 21:00
  • Not sure, but I think Windows might be reporting "permission denied" for different kind of failures, for example, you tried to open the same file twice, but did not close it. Maybe `file_in` and `file_out` are the same? But this is just a guess, it could be anything. – zvone Nov 20 '20 at 21:02
  • I like that you're thinking about this, thanks. I'm using ```with open```, so it should handle its closing at the end of the block. And, ```file_in`` is "bellingham_map.osm", so it's not the same as ```file_out```. I think you're right about the error being used for a range of problems. I haven't found an example that fits this one, though. – Kaleb Coberly Nov 20 '20 at 21:34
  • Which operating system are you running? – Charles Duffy Nov 20 '20 at 22:17
  • 1
    ...that said, if you just hold a single file descriptor open and don't re-open the file, there aren't further permission checks. Reopening something over-and-over is really slow, anyhow. – Charles Duffy Nov 20 '20 at 22:17
  • @CharlesDuffy, Windows 10 Pro. – Kaleb Coberly Nov 20 '20 at 22:17
  • Okay -- that introduces a bunch of potential interestingness (Windows is a lot more restrictive than UNIX-y platforms about concurrent file access). – Charles Duffy Nov 20 '20 at 22:18
  • 2
    _Why_ are you reopening the file every time you want to write a single element? It would be much, _much_ faster and more reliable to just open it once before writing anything, and closing it only when completely done. – Charles Duffy Nov 20 '20 at 22:18
  • @CharlesDuffy That's what I was doing before when I wrote to a single file, but I want to write to N number of files without parsing through the XML doc N number of times. – Kaleb Coberly Nov 20 '20 at 22:20
  • 2
    Do you have a virus scanner that’s making your file in use every time you close it as it scans the new file? – Mark Tolonen Nov 20 '20 at 22:21
  • @MarkTolonen, I do have Norton, and I'm doing this in Jupyter Notebook, which I think also accesses it to update its directory. I was just thinking Jupyter might be trying to access the file at the same time. Maybe Norton is too. – Kaleb Coberly Nov 20 '20 at 22:23
  • 1
    It’s probably Norton. Jupyter wouldn’t have to open it just to update a directory – Mark Tolonen Nov 20 '20 at 22:23
  • What I'm finding is that restarting the kernel seems to do the trick, at least with a direct call. I don't know why, but I will take it. Now I'll try calling open/write from within another function after restarting the kernel. – Kaleb Coberly Nov 20 '20 at 22:27
  • @CharlesDuffy, it is a lot slower. Can I nest ```with open()``` blocks? – Kaleb Coberly Nov 20 '20 at 22:33
  • Nesting ```with open()``` has done it. Thanks, @CharlesDuffy. – Kaleb Coberly Nov 20 '20 at 22:51

1 Answers1

0

Thanks to some commentors, I've found that nesting with open() outside of iterparse() does the trick.

My antivirus program was likely the culprit, accessing the file each time it was closed, which occasionally caused an access conflict on the next file open operation.

I didn't like opening and closing the file for each element because of the operation cost, and I originally had parsed the XML from within the file open when I was writing to a single file. But, when I decided to write to multiple files, I didn't want to parse the XML over again for each JSON file I wanted to write from it.

Seems like a no-brainer now, but nesting file opens works and saves compute. It might build up the indents if you want to write a lot of files, but c'est la vie.

def process_map(file_in, fo_pre, mode = "a"):
    
    with codecs.open(fo_pre+"_nodes.json", mode) as nd_fo, \
    codecs.open(fo_pre+"_ways.json", mode) as wy_fo, \
    codecs.open(fo_pre+"_relations.json", mode) as rl_fo:

        for _, element in ET.iterparse(file_in):
                el = shape_element(element)
                if el:
                    if el["doc_type"] == "node":
                        write_el(el=el, file_out=fo_pre+"_nodes.json",
                                 mode=mode)
                    elif el["doc_type"] == "way":
                        write_el(el=el, file_out=fo_pre+"_ways.json",
                                 mode=mode)
                    elif el["doc_type"] == "relation":
                        write_el(el=el, file_out=fo_pre+"_relations.json",
                                 mode=mode)

    return
Kaleb Coberly
  • 420
  • 1
  • 4
  • 19
  • 1
    You don't need to indent, you can open several items with one `with`. See [How can I open multiple files using `with open` in Python?](https://stackoverflow.com/questions/4617034/how-can-i-open-multiple-files-using-with-open-in-python) – Charles Duffy Nov 20 '20 at 23:28