I have a piece of code trying to merge cols. with duplicated cols names in a pandas structure, basically I am trying to do the same thing like this post:
Python Pandas merge samed name columns in a dataframe
However, the DataFrame I am trying to process, is loaded from a CSV which is around 1GB, and has around 2600 columns & 27000+ rows.
The code runs, but it runs for ~2hr20min.
Out of 2600 columns, only ~30 of cols needs to be merged into 4 cols, say from 13th~42nd.
Is there a way to optimize the code mentioned in the linked posts? Perhaps find a way to tell Pandas just to GroupBy cols from 13th~42nd, and join only the fields in this area.
Greatly appreciated.