I have a DataFrame that I split into column Series (col_series in the snippet below)and use apply tests to each value in each Series. But I would like to report which row in the Series is affected when I detect and error.
...
col_series.apply(self.testdatelimits, args= \
(datetime.strptime('2018-01-01', '%Y-%m-%d'), key))
def testlimits(self, row_id, x, lowerlimit, col_name):
low_error = None
d = float(x)
if lowerlimit != 'NA' and d < float(lowerlimit):
low_error = 'Following record has column ' + col_name + ' lower than range check'
if low_error is not None:
self.set_error(col_index, row_id, low_error)
Of course the above fails because x is a str and does not have the name property. I am thinking that maybe I can pass in the row index in the Series, but am not clear on how to do that?
Edit: I switched to use a list comprehension to solve this issue rather than ps apply. It is significantly faster too
col_series = col_series.apply(pd.to_datetime, errors='ignore')
dfwithrow = pd.DataFrame(col_series)
dfwithrow.insert(0, 'rowid', range(0, len(dfwithrow)))
dfwithrow['lowerlimit'] = lowlimit
dfwithrow['colname'] = 'fred'
list(map(self.testdatelimits, dfwithrow['rowid'], dfwithrow[colvalue[0]], \
dfwithrow['lowerlimit'], dfwithrow['colname']))