Here is a comparison with time estimates. I used list comprehension
and pandas DataFrame
. Once you load the data in as a dataframe, the dataframe method is faster. But if you consider loading in the data as well, it is obviously the list comprehension which emerges as the winner.
I made list with 1000,000 dictionaries in it, each with a similar structure as yours.
Note:
- The test was conducted on a google-colab notebook with two CPUs. OS: Ubuntu Linux.
- Also I modified the code to help check the output with
val.get('id')
instead of what you need, val.get('disabled')
.
Make Dummy Data
import numpy as np
import pandas as pd
def make_dict(id=0, disabled=True):
dis = 'disabled' if disabled else ''
d = {'id': id, 'label': 'UK 9½', 'price': '0', 'oldPrice': '0', 'products': ['105515', True, '0'], 'disabled': dis}
return d
def make_list(size=10, seed=0):
np.random.seed(seed=seed)
status = (np.random.rand(size)>0.5)
ids = np.arange(size) + 1
vals = [make_dict(id = id, disabled = disabled) for id, disabled in zip(ids, status)]
return vals
vals = make_list(size=1000000, seed=0)
df = pd.DataFrame(vals)
1. Test List Comprehension (Best Option: Fastest)
%%time
ids = [val.get('id') for val in vals if val.get('disabled')=='']
Output:
CPU times: user 178 ms, sys: 0 ns, total: 178 ms
Wall time: 184 ms
2. Test Pandas DataFrame (without data-loading)
Here we do not consider the time needed for loading the data as a dataframe.
%%time
ids = df.loc[df['disabled']=='','id'].tolist()
Output:
CPU times: user 68.4 ms, sys: 6.03 ms, total: 74.4 ms
Wall time: 75.6 ms
3. Test Pandas DataFrame (with data-loading)
Here we do INCLUDE the time needed for loading the data as a dataframe.
%%time
df = pd.DataFrame(vals)
ids = df.loc[df['disabled']=='','id'].tolist()
Output:
CPU times: user 1.2 s, sys: 49.5 ms, total: 1.25 s
Wall time: 1.26 s