I'm working on a project where my original dataframe is:
A B C label
0 1 2 2 Nan
1 2 4 5 7
2 3 6 5 Nan
3 4 8 7 Nan
4 5 10 3 8
5 6 12 4 8
But, I have an array with new labels for certain points (for that I only used columns A and B) in the original dataframe. Something like this:
X_labeled = [[2, 4], [3,6]]
y_labeled = [5,9]
My goal is to add the new labels to the original dataframe. I know that the combination of A and B unique is. What is the fastest way to assign the new label to the correct row?
This is my try:
y_labeled = np.array(y).astype('float64')
current_position = 0
for point in X_labeled:
row = df.loc[(df['A'] == point[0]) & (df['B'] == point[1])]
df.at[row.index, 'label'] = y_labeled[current_position]
current_position += 1
Wanted output (rows with index 1 and 2 are changed):
A B C label
0 1 2 2 Nan
1 2 4 5 5
2 3 6 5 9
3 4 8 7 Nan
4 5 10 3 8
5 6 12 4 8
For small datasets may this be okay with I'm currently using it for datasets with more than 25000 labels. Is there a way that is faster?
Also, in some cases I used all columns expect the column 'label'. That dataframe exists out of 64 columns so my method can not be used here. Has someone an idea to improve this?
Thanks in advance