I'm trying to write an algorithm that will save the filename and the 3 channel np.array stored in each filename to a csv (or similar filetype), and then be able to read in the csv and reproduce the color image image.
The format of my csv should look like this:
Filename RGB
0 foo.png np.array # the shape is 100*100*3
1 bar.png np.array
2 ... ...
As it stands, I'm iterating through each file saved in a directory and appending a list that later gets stored in a pandas.DataFrame:
df1= pandas.DataFrame()
df2= pandas.DataFrame()
directory= r'C:/my Directory'
fileList= os.listdir(directory)
filenameList= []
RGBList= []
for eachFile in fileList:
filenameList.append(eachFile)
RGBList.append(cv2.imread(directory + eachFile, 1).tostring())
df1["Filenames"]= filenameList
df2["RGB"]= RGBList
df1.to_csv('df1.csv')
df2.to_csv('df2.csv')
df1 functions as desired. I THINK df2 fuctions as intended. A print
statement reveals the correct len
of 30,000 for each row of the csv. However, when I read in the csv using pandas.read_csv('df2')
and use a print
statement to view the len
of the first row, I get 110541. I intend to use np.fromstring()
and np.reshape()
to reshape the flattened np.array
generated from np.tostring()
, but I get the error:
ValueError: string size must be a multiple of element size
...because the number of elements is mismatched.
My question is:
- Why is the
len
so much larger when I read in the csv? - Is there a more efficient way to write 3 channel color image pixel data to a csv that can easily be read back in?