I have the following dataframe:
import pandas as pd
array = {'id': [1, 1, 1, 2, 2, 2, 3, 3], 'A': [False, False, True, False, False, False, True, True],
'B': [False, True, True, False, True, False, False, False]}
df = pd.DataFrame(array)
df
I want to represent each id in one line. If all values of this id on a specific column are False, then its value should be False. If at least one is True - then it should be True. I have started with:
df.groupby(['id']).sum()
After which I will convert every value above 0 to 1. This works fine, but my original dataframe has 2,000,000 rows and 14,000 columns and therefore it takes days....
Any other quicker idea to do the task?