I'm relatively new to Python and functions. I'm attempting to iterate the following function through each row of a dataframe and append the computed result for each row to a new column:
def manhattan_distance(x,y):
return sum(abs(a-b) for a,b in zip(x,y))
For reference, this is the dataframe I'm testing on:
entries = [
{'age1':'2', 'age2':'2'},
{'age1':'12', 'age2': '12'},
{'age1':'5', 'age2': '50'}
]
df=pd.DataFrame(entries)
df['age1'] = df['age1'].astype(str).astype(int)
df['age2'] = df['age2'].astype(str).astype(int)
I've seen this answer How to iterate over rows in a DataFrame in Pandas? and have got as far as this:
import itertools
for index, row in df.iterrows():
df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
Which returns the following:
----------------------------------------------------------------------- ----
TypeError Traceback (most recent call last)
<ipython-input-42-aa6a21cd1de9> in <module>()
4 # print (manhattan_distance(row['age1'],row['age2']))
5
----> 6 df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4852 f, axis,
4853 reduce=reduce,
-> 4854 ignore_failures=ignore_failures)
4855 else:
4856 return self._apply_broadcast(f, axis)
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4948 try:
4949 for i, v in enumerate(series_gen):
-> 4950 results[i] = func(v)
4951 keys.append(v.name)
4952 except Exception as e:
<ipython-input-42-aa6a21cd1de9> in <lambda>(row)
4 # print (manhattan_distance(row['age1'],row['age2']))
5
----> 6 df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
<ipython-input-36-74da75398c4c> in manhattan_distance(x, y)
1 def manhattan_distance(x,y):
2
----> 3 return sum(abs(a-b) for a,b in zip(x,y))
4 # return sum(abs(a-b) for a,b in map(lambda x: zip(a,b)))
TypeError: ('zip argument #1 must support iteration', 'occurred at index 0')
Based on other responses to the question I referred above, I have attempted to amend the zip statement in my function:
import itertools
for index, row in df.iterrows():
df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
The above returns this:
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-44-aa6a21cd1de9> in <module>()
4 # print (manhattan_distance(row['age1'],row['age2']))
5
----> 6 df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4852 f, axis,
4853 reduce=reduce,
-> 4854 ignore_failures=ignore_failures)
4855 else:
4856 return self._apply_broadcast(f, axis)
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4948 try:
4949 for i, v in enumerate(series_gen):
-> 4950 results[i] = func(v)
4951 keys.append(v.name)
4952 except Exception as e:
<ipython-input-44-aa6a21cd1de9> in <lambda>(row)
4 # print (manhattan_distance(row['age1'],row['age2']))
5
----> 6 df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
<ipython-input-43-5daf167baf5f> in manhattan_distance(x, y)
2
3 # return sum(abs(a-b) for a,b in zip(x,y))
----> 4 return sum(abs(a-b) for a,b in map(lambda x: zip(a,b)))
TypeError: ('map() must have at least two arguments.', 'occurred at index 0')
If this is the right approach take, I'm unclear what my map() arguments need to be for the function to work.