Fastest way to locate rows of a dataframe from two lists and concatenate them?

Question

Apologies if this has already been asked, I haven't found anything specific enough although this does seem like a general question. Anyways, I have two lists of values which correspond to values in a dataframe, and I need to pull those rows which contain those values and make them into another dataframe. The code I have works, but it seems quite slow (14 seconds per 250 items). Is there a smart way to speed it up?

row_list = []
for i, x in enumerate(datetime_list):
    row_list.append(df.loc[(df["datetimes"] == x) & (df.loc["b"] == b_list[i])])

data = pd.concat(row_list)

Edit: Sorry for the vagueness @anky, here's an example dataframe

import pandas as pd
from datetime import datetime

df = pd.DataFrame({'datetimes' : [datetime(2020, 6, 14, 2), datetime(2020, 6, 14, 3), datetime(2020, 6, 14, 4)],
                   'b' : [0, 1, 2],
                   'c' : [500, 600, 700]})

How this compare to setting datetimes and b into a multiindex then use reindex or loc with a list of tuples? — Scott Boston, Jul 14 '22 at 17:35
please provide us a workable example: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — anky, Jul 14 '22 at 18:02

score 0 · Answer 1 · answered Jul 14 '22 at 17:43

0

IIUC, try this

dfi = df.set_index(['datetime', 'b'])
data = dfi.loc[list(enumerate(datetime_list)), :].reset_index()

Without test data in question it is hard to verify if this correct.

answered Jul 14 '22 at 17:43

Scott Boston

147,308
15
139
187

1

`b` should be already an index value since `df.loc["b"]==...` works ? – anky Jul 14 '22 at 18:00
1

Thank you very much, that sped it up by 5 seconds! Testing on larger dataset now. And yeah I don't need to enumerate the datetime_list using this, instead I passed in a tuple of (datetime, b) – Rondalf Jul 14 '22 at 18:02

Fastest way to locate rows of a dataframe from two lists and concatenate them?

1 Answers1