I have a Dataframe (df) that looks like this.
Main Col_1 Col_2 Col_3
0 v1 1 0 0
1 v2 0 1 1
2 v1 1 1 0
3 v2 1 0 1
4 v5 1 0 0
5 v2 1 0 0
I'm creating a new Dataframe based on unique values in Main column. i.e. Iterating through every row and when encounter a new value in Main column, add that row to new DataFrame.
New DataFrame (new_df) should look like this.
Main Col_1 Col_2 Col_3
0 v1 1 0 0
1 v2 0 1 1
2 v5 1 0 0
My current approach is iterating through every row and ...
unique_message_list = []
new_df_list = []
for index, row in df.iterrows():
if row['Main'] not in unique_message_list:
unique_message_list.append(row['Main'])
new_df_list.append(row.tolist())
new_df = pd.DataFrame(new_df_list, columns=['Main', 'Col_1', 'Col_2', 'Col_3'])
But df has 1 Million rows so it takes time to process it with iterating. How to solve it efficiently?