The solutions using melt
are slower than OP's original method, which they shared in the answer here, especially after the speedup from my comment on that answer.
I created a larger dataframe to test on:
df = pd.DataFrame({'name_array': np.random.rand(1000, 3).tolist()})
And timing the two solutions using melt
on this dataframe yield:
In [16]: %timeit pd.melt(df.name_array.apply(pd.Series).reset_index(), id_vars=['index'],value_name='name_array').drop('variable', axis=1).sort_values('index')
173 ms ± 5.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [17]: %timeit df['name_array'].apply(lambda x: pd.Series([i for i in x])).melt().drop('variable', axis=1)['value']
175 ms ± 4.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The OP's method with the speedup I suggested in the comments:
In [18]: %timeit pd.Series(np.concatenate(df['name_array']))
18 ms ± 887 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And finally, the fastest solution as provided here but modified to provide a series instead of dataframe output:
In [14]: from itertools import chain
In [15]: %timeit pd.Series(list(chain.from_iterable(df['name_array'])))
402 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
This last method is faster than melt()
by 3 orders of magnitude and faster than np.concatenate()
by 2 orders of magnitude.