I have a large dataframe(6M rows) and it's like below
Column A | Column B |
---|---|
000001 | AB1234 |
000002 | CD1234 |
The Column A is unique but Column B is not
I have some index list to query this large df and I want to get the Column B value for every index as my result.
index list is like below and I have 4K such lists and the length of each list is big.
query_list = ['000002', '000003', '000014', '000101']
Running on Python3.x, Jupyter Notebook, Pandas 1.3.x
I have tried df.query() and df[df["column name"].str.contain.()] but both of them take many time.
- df.query() cost 57x s
- df[df["column name"].str.contain.()] cost 7xx s
And I have also tried to run this code with Pool.map() but it didn't work.
Is there any solution?