I have a list of lists that I want to turn into a dataframe, keeping their index in the original list as well.
x = [["a", "b", "c"], ["A", "B"], ["AA", "BB", "CC"]]
I can do this with a for loop like this:
result = []
for id, row in enumerate(x):
d = pd.DataFrame({"attr": row, "id": [id]*len(row)})
result.append(d)
result = pd.concat(result, ignore_index=True)
Or the equivalent generator expression:
pd.concat((pd.DataFrame({"attr": row, "id": [id]*len(row)})
for id, row in enumerate(x)), ignore_index=True)
Both works fine, producing a data frame like:
id attr
0 0 a
1 0 b
2 0 c
3 1 A
4 1 B
5 2 AA
6 2 BB
7 2 CC
But it feels like there should be a more 'panda-esque' way of doing it than with a list-loop-append pattern or the equivalent generator.
Can I create the dataframe above with a pandas call, i.e. without the for loop or python comprehension?
(preferably also a faster solution: on the 'genres' of the movie lens data set at https://grouplens.org/datasets/movielens/ this takes >4 seconds to flatten list of genres per movie, even though it is only 20k entries in total...)