2

I have created a rather large CSV file (63000 rows and around 40 columns) and I want to join it with an ESRI Shapefile. I have used ArcPy but the whole process takes 30! minutes. If I make the join with the original (small) CSV file, join it with the Shapefile and then make my calculations with ArcPy and continously add new fields and calculate the stuff it takes 20 minutes. I am looking for a faster solution and found there are other Python modules such as PySHP or DBFPy but I have not found any way for joining tables, hoping that could go faster.

My goal is already to get away from ArcPy as much as I can and preferable only use Python, so preferably no PostgreSQL and alikes either.

Does anybody have a solution for that? Thanks a lot!

Kai
  • 299
  • 6
  • 13
  • export the shapefile to a numpy array using FeatureClassToNumPyArray in arcpy. Read your table into a numpy array (many options) Import numpy's recfunctions... there is the ability to join tabular data builtin there. (import numpy.lib.recfunctions as rfn .... then do a dir(rfn) to find the functions) – NaN Oct 05 '16 at 11:06
  • @NaN: Please make that an answer with more of the details needed. – Ethan Furman Oct 05 '16 at 15:05
  • Can you give us some details on the join and what kind of operations you are doing? – Ethan Furman Oct 05 '16 at 15:06
  • @EthanFurman details can be found on https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py the join is a 1:1 not a 1 to many ('relate' in esri parlance) – NaN Oct 06 '16 at 00:50

1 Answers1

0

Not exactly a programmatical solution for my problem but a practical one:

My shapefile is always static, only the attributes of the features will change. So I copy my original shapefile (only the basic files with endings .shp, .shx, .prj) to my output folder and rename it to the name I want. Then I create my CSV-File with all calculations and convert it to DBF and save it with the name of my new shapefile to the output folder too. ArcGIS will now load the shapefile along with my own DBF file and I don't even need to do any tablejoin at all!

Now my program runs through in only 50 seconds!

I am still interested in more solutions for the table join problem, maybe I will encounter that problem again in the future where the shapefile is NOT always static. I did not really understand Nan's solution, I am still at "advanced beginner" level in Python :)

Cheers

Kai
  • 299
  • 6
  • 13