df.apply(lambda x: x.list_values[x.start:x.stop], axis=1)
Output:
0 [5, 7]
1 [3, 5]
2 [1, 3]
3 [1, 3]
4 [3, 5]
5 [5, 7]
6 [1, 3]
dtype: object
I'm not sure why, but the fastest variation appears to be:
df['sliced'] = [lst[start:stop] for lst, start, stop in zip(df.list_values.tolist(), df.start.tolist(), df.stop.tolist())]
My testing:
df = pd.DataFrame({'list_values': {0: [5, 7, 6, 8], 1: [1, 3, 5, 7, 2, 4, 6, 8], 2: [1, 3, 5, 7, 2, 4, 6, 8], 3: [1, 3, 5, 7, 2, 4, 6, 8], 4: [1, 3, 5, 7, 2, 4, 6, 8], 5: [1, 3, 5, 7, 2, 4, 6, 8], 6: [1, 3, 5, 7, 2, 4, 6, 8]}, 'start': {0: 0, 1: 1, 2: 0, 3: 0, 4: 1, 5: 2, 6: 0}, 'stop': {0: 2, 1: 3, 2: 2, 3: 2, 4: 3, 5: 4, 6: 2}})
df = pd.concat([df]*100000)
# Shape is now (700000, 3)
def v1(df):
temp = df.copy()
temp['sliced'] = [lst[start:stop] for lst, start, stop in temp.values.tolist()]
def v2(df):
temp = df.copy()
temp['sliced'] = [lst[start:stop] for lst, start, stop in zip(temp.list_values, temp.start, temp.stop)]
def v3(df):
temp = df.copy()
temp['sliced'] = [lst[start:stop] for lst, start, stop in temp.values]
def v4(df):
temp = df.copy()
temp['sliced'] = [lst[start:stop] for lst, start, stop in zip(df.list_values.tolist(), df.start.tolist(), df.stop.tolist())]
def v5(df):
temp = df.copy()
temp['sliced'] = temp.apply(lambda x: x.list_values[x.start:x.stop], axis=1)
%timeit -n 10 v1(df)
%timeit -n 10 v2(df)
%timeit -n 10 v3(df)
%timeit -n 10 v4(df)
%timeit v5(df)
Output:
# v1: temp.values.tolist()
235 ms ± 21.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# v2: zip(temp.list_values, temp.start, temp.stop)
249 ms ± 9.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# v3: temp.values
578 ms ± 6.98 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# v4: zip(df.list_values.tolist(), df.start.tolist(), df.stop.tolist())
149 ms ± 8.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# v5: apply
12.1 s ± 165 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But yes, the list comprehension method, no matter what variation, is significantly faster than using apply
.
Third update:
I figured out how to sort of vectorize this problem, using groupby and transform. Still not quite as good as the best list comprehension in my testing, but pretty darn good.
def v6(df):
temp = df.copy()
temp['sliced'] = temp.groupby(['start','stop'])['list_values'].transform(lambda x: x.str[x.name[0]:x.name[1]])
# v6: groupby
256 ms ± 5.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)