How do I divide a very large OpenStreetMap file into smaller files in R without running out of memory?

Question

I am currently looking to have map files that are no larger than the sizes of municipalities in Mexico (at largest, about 3 degrees longitude/latitude across). However, I have been running into memory issues (at the very least) when trying to do so. The file size of the OSM XML object is 1.9 GB, for reference.

library(osmar)
get.map.for.municipality<-function(province,municipality){
  base.map.filename = 'OpenStreetMap/mexico-latest.osm'
  #bounds.list is a list that contains the boundaries
  bounds = bounds.list[[paste0(province,'*',municipality)]]
  my.bbox = corner_bbox(bounds[1],bounds[2],bounds[3],bounds[4])
  my.map.source = osmsource_file(base.map.filename)
  my.map = get_osm(my.bbox,my.map.source)
  return(my.map)
}

I am running this inside of a loop, but it can't even get past the first one. When I tried running it, my computer froze and I was only able to take a screenshot with my phone. The memory steadily inclined over the course of a few minutes, and then it shot up really quickly, and I was unable to react before my computer froze.

What is a better way of doing this? I expect to have to run this loop about 100-150 times, so any way that is more efficient in terms of memory would help. I would prefer not to download smaller files from an API service. If necessary, I would be willing to use another programming language (preferably Python or C++), but I prefer to keep this in R.

enter image description here

One thing that surprises me from exploring the `osmar` package's code is that it never actually uses the bounding box at *all* in `get_osm`. Type in `get_osm` and you'll see `x` is passed as the second argument of `get_osm_data`, and doesn't use it afterwards. But look at `osmar:::get_osm_data.osmfile` and it's just `readLines(source$file)`. It completely ignores the bounding box. No wonder it runs out of memory! — David Robinson, Apr 13 '15 at 23:26
As further evidence, create a sample small OSM file called `test.osm`. Then try `get_osm(blablabla, source = osmsource_file("test.osm"))`. (That's not a pseudocode example: literally type `blablabla`). The function works fine. It never uses the bounding box so that argument is never evaluated! (It uses the box only when it's querying an API). — David Robinson, Apr 13 '15 at 23:31
In any case, you'll need to parse the XML file iteratively, throwing out nodes that don't fall within the bounding box. As far as I know R has no tools for doing that. Python does have [iterparse](http://effbot.org/zone/element-iterparse.htm), which you should have more luck with — David Robinson, Apr 13 '15 at 23:47
Thanks for the advice. I managed to use iterparse to navigate through the file properly. The main issue I had was that the `` tag at the beginning of the file contained all the other objects, but I just used `readline()` to get those. — Max Candocia, Apr 16 '15 at 17:40

defvol · Accepted Answer · 2015-07-21T15:31:42.650

I'd suggest not use R for that.

There are better tools for that job. Many ways to split, filter stuff from the command line or using a DBMS.

Here are some alternatives extracted from the OSM Wiki http://wiki.openstreetmap.org:

Filter your osm files using osmfilter: "osmfilter is used to filter OpenStreetMap data files for specific tags. You can define different kinds of filters to get OSM objects (i.e. nodes, ways, relations), including their dependent objects, e.g. nodes of ways, ways of relations, relations of other relations."

Clipping based on Polygons or borders using osmconvert: http://wiki.openstreetmap.org/wiki/Osmconvert#Applying_Geographical_Borders

You can write bash scripts for both osmfilter and osmconvert, but I'd recommend using a DBMS. Just import into PostGIS using osm2pgsql, and connect your R code with any Postgresql driver. This will optimize your read/write ops.

How do I divide a very large OpenStreetMap file into smaller files in R without running out of memory?

1 Answers1