I have a pandas dataset with x number of batches (batch sizes are different, i.e rows), now I create a new feature for each batch using the respective batch data.
I want to automate this process, e.g.first create a new column then iterate over the batch id column until it has the same batch id, create new feature values and append the newly created column, then continue to next batch
here is code for the manual method for single batch
from sklearn.neighbors import BallTree
batch = samples.loc[samples['batch id'] == 'XX']
tree = BallTree(red_points[['col1','col2']], leaf_size=15, metric='minkowski')
distance, index = tree.query(batch[['col1','col2']], k=2)
batch_size = batch.shape[0]
batch['new feature'] = distance[np.arange(batch_size),batch.col3]