1

I have a list of images and I want to get all the pixels of each image in one DataFrame column and the number of the image into another column. I am trying to do it with

plotDF = DataFrame()
plotData = [np.array([[1,2,1],[1,1,2],[4,2,1]]), np.array([[1,2,2,1],[1,3,1,3]]), np.array([[1,1,2,3],[4,1,1,1],[1,1,1,4]])]
plotData = [image.flatten() for image in plotData]
for n, pD in zip(range(len(plotData)), plotData):
    for pixel in pD:
        plotDF = plotDF.append(DataFrame.from_records([{'n': n, 'pixel': pixel}]))
plotDF = plotDF.reset_index(drop=True)

but this seems really inefficient.

How can I do this more efficient, possibly with https://github.com/kieferk/dfply?

Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
Make42
  • 12,236
  • 24
  • 79
  • 155
  • Are you after this: http://stackoverflow.com/questions/19112398/getting-list-of-lists-into-pandas-dataframe? – EdChum Mar 20 '17 at 10:09
  • @EdChum: `plotData` does not come from files. It is the result of image processing. – Make42 Mar 20 '17 at 10:40

1 Answers1

1

I think you can use numpy.repeat for repeat values by legths by str.len and flat values of nested lists by chain.

from  itertools import chain

s = pd.Series(plotData)
df2 = pd.DataFrame({
        "n": np.repeat(s.index + 1, s.str.len()),
        "pixel": list(chain.from_iterable(s))})
print (df2)
    n  pixel
0   1      1
1   1      2
2   1      1
3   1      1
4   1      1
5   1      2
6   1      4
7   1      2
8   1      1
9   2      1
10  2      2
11  2      2
12  2      1
13  2      1
14  2      3
15  2      1
16  2      3
17  3      1
18  3      1
19  3      2
20  3      3
21  3      4
22  3      1
23  3      1
24  3      1
25  3      1
26  3      1
27  3      1
28  3      4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252