0

Apologies if this has already been asked, I haven't found anything specific enough although this does seem like a general question. Anyways, I have two lists of values which correspond to values in a dataframe, and I need to pull those rows which contain those values and make them into another dataframe. The code I have works, but it seems quite slow (14 seconds per 250 items). Is there a smart way to speed it up?

row_list = []
for i, x in enumerate(datetime_list):
    row_list.append(df.loc[(df["datetimes"] == x) & (df.loc["b"] == b_list[i])])

data = pd.concat(row_list)

Edit: Sorry for the vagueness @anky, here's an example dataframe

import pandas as pd
from datetime import datetime

df = pd.DataFrame({'datetimes' : [datetime(2020, 6, 14, 2), datetime(2020, 6, 14, 3), datetime(2020, 6, 14, 4)],
                   'b' : [0, 1, 2],
                   'c' : [500, 600, 700]})
Rondalf
  • 1
  • 1

1 Answers1

0

IIUC, try this

dfi = df.set_index(['datetime', 'b'])
data = dfi.loc[list(enumerate(datetime_list)), :].reset_index()

Without test data in question it is hard to verify if this correct.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    `b` should be already an index value since `df.loc["b"]==...` works ? – anky Jul 14 '22 at 18:00
  • 1
    Thank you very much, that sped it up by 5 seconds! Testing on larger dataset now. And yeah I don't need to enumerate the datetime_list using this, instead I passed in a tuple of (datetime, b) – Rondalf Jul 14 '22 at 18:02