0

I'm trying to combine every block file from the 2010 census together into a single master block file for the US. I'm currently doing this in Google Colab and even on their pro subscription - which gives you about 25GB of RAM - I'm maxing out all available memory on the 45th file (I just have 5 more to go!). Code wise, I'm just building a list of dataframes that need to be concated together and ultimately written to disk:

gdfs = []
census_blocks_basepath = r'/content/drive/My Drive/Census/blocks/'
census_block_filenames = [f for f in os.listdir(census_blocks_basepath) if f.endswith('.shp')]
for index, block_filename in enumerate(census_block_filenames):
  file_name = os.path.join(census_blocks_basepath, block_filename)
  gdfs.append(gpd.read_file(file_name))
  print('Appended file %s, %s' % (index, block_filename))

gdf = gpd.GeoDataFrame(pd.concat(gdfs, ignore_index=True), crs=dataframesList[0].crs)
# gdf.reset_index(inplace=True, drop=True)
gdf.head(3)

Instead, I think I should:

  1. load a single geodataframe
  2. append it to a master dataframe that exists on disk (rather than in memory like csv.writer)
  3. delete the loaded geodataframe from 1 (to avoid memory accrual)
  4. then repeat 1-3 for all geodataframes remaining in the source directory

I don't see documentation on whether geopandas supports disk based appends.. it only seems able to overwrite previous files via GeoDataFrame.to_file. That said, I see that geopandas has a GeoDataFrame.to_postgis method with a chunksize argument, which makes me think that it's possible to append data onto a geofile on disk (or I'm wrong and that's just a feature of postgis).

Any ideas?

zelusp
  • 3,500
  • 3
  • 31
  • 65

1 Answers1

0

From MartinFleis

Yes, any file format which supports appending (and is supported by fiona) can be appended. You just have to specify mode="a".

df.to_file(filename, mode="a")

You can check if a mode is supported using

import fiona
fiona.supported_drivers

This is the current result r-read, a-append, w-write.

{'AeronavFAA': 'r',
 'ARCGEN': 'r',
 'BNA': 'raw',
 'DXF': 'raw',
 'CSV': 'raw',
 'OpenFileGDB': 'r',
 'ESRIJSON': 'r',
 'ESRI Shapefile': 'raw',
 'GeoJSON': 'rw',
 'GeoJSONSeq': 'rw',
 'GPKG': 'rw',
 'GML': 'raw',
 'GPX': 'raw',
 'GPSTrackMaker': 'raw',
 'Idrisi': 'r',
 'MapInfo File': 'raw',
 'DGN': 'raw',
 'PCIDSK': 'r',
 'S57': 'r',
 'SEGY': 'r',
 'SUA': 'r',
 'TopoJSON': 'r'}
zelusp
  • 3,500
  • 3
  • 31
  • 65