I have a propitiatory cursor (arcpy.da.SearchCursor) object that I need to load into a pandas dataframe.
It implements next(), reset() as you would expect for a generator object in Python.
Using another post in stackexchange, which is brilliant, I created a class that makes the generator act like a file-like object. This works for the default case, where chunksize is not set, but when I go to set the chunk size for each dataframe, it crashes python.
My guess is that the n=0 needs to be implemented so x number of rows are returned, but so far this has been wrong.
What is the proper way to implement my class so I can use generators to load a dataframe? I need to use chunksize because my datasets are huge.
So the pseudo code would be:
customfileobject = Reader(cursor)
dfs = pd.read_csv(customfileobject, columns=cursor.fields,
chunksize=10000)
I am using Pandas version 0.16.1 and Python 2.7.10.
Class below:
class Reader(object):
"""allows a cursor object to be read like a filebuffer"""
def __init__(self, fc=None, columns="*", cursor=None):
if cursor or fc:
if fc:
self.g = arcpy.da.SearchCursor(fc, columns)
else:
self.g = cursor
else:
raise ValueError("You must provide a da.SearchCursor or table path and column names")
def read(self, n=0):
try:
vals = []
if n == 0:
return next(self.g)
else:
# return multiple rows?
for x in range(n):
try:
vals.append(self.g.next())
except StopIteration:
return ''
except StopIteration:
return ''
def reset(self):
self.g.reset()