0

I have a large dataframe(6M rows) and it's like below

Column A Column B
000001 AB1234
000002 CD1234

The Column A is unique but Column B is not I have some index list to query this large df and I want to get the Column B value for every index as my result. index list is like below and I have 4K such lists and the length of each list is big. query_list = ['000002', '000003', '000014', '000101']

Running on Python3.x, Jupyter Notebook, Pandas 1.3.x

I have tried df.query() and df[df["column name"].str.contain.()] but both of them take many time.

  • df.query() cost 57x s
  • df[df["column name"].str.contain.()] cost 7xx s

And I have also tried to run this code with Pool.map() but it didn't work.

Is there any solution?

0 Answers0