Transform list to dataframe efficiently

Question

I have a list of images and I want to get all the pixels of each image in one DataFrame column and the number of the image into another column. I am trying to do it with

plotDF = DataFrame()
plotData = [np.array([[1,2,1],[1,1,2],[4,2,1]]), np.array([[1,2,2,1],[1,3,1,3]]), np.array([[1,1,2,3],[4,1,1,1],[1,1,1,4]])]
plotData = [image.flatten() for image in plotData]
for n, pD in zip(range(len(plotData)), plotData):
    for pixel in pD:
        plotDF = plotDF.append(DataFrame.from_records([{'n': n, 'pixel': pixel}]))
plotDF = plotDF.reset_index(drop=True)

but this seems really inefficient.

How can I do this more efficient, possibly with https://github.com/kieferk/dfply?

Are you after this: http://stackoverflow.com/questions/19112398/getting-list-of-lists-into-pandas-dataframe? — EdChum, Mar 20 '17 at 10:09
@EdChum: `plotData` does not come from files. It is the result of image processing. — Make42, Mar 20 '17 at 10:40

jezrael · Accepted Answer · 2017-03-20T10:26:58.950

I think you can use numpy.repeat for repeat values by legths by str.len and flat values of nested lists by chain.

from  itertools import chain

s = pd.Series(plotData)
df2 = pd.DataFrame({
        "n": np.repeat(s.index + 1, s.str.len()),
        "pixel": list(chain.from_iterable(s))})

print (df2)
    n  pixel
0   1      1
1   1      2
2   1      1
3   1      1
4   1      1
5   1      2
6   1      4
7   1      2
8   1      1
9   2      1
10  2      2
11  2      2
12  2      1
13  2      1
14  2      3
15  2      1
16  2      3
17  3      1
18  3      1
19  3      2
20  3      3
21  3      4
22  3      1
23  3      1
24  3      1
25  3      1
26  3      1
27  3      1
28  3      4

Transform list to dataframe efficiently

1 Answers1