0

I am working on a csv file, which in one column contains images (vectorized). Here is the csv file ~240MB.

I am trying to convert the Image string to a list of integers, reshape into matrix, flip, and the reshape it back to a list, then finally convert back in to a long string. But things didn't turn out to be what I expected. Below are my codes:

import pandas as pd
import numpy as np
df = pd.read_csv('training.csv')
img = df['Image'][0] # take the first row as example
img_int = np.fromstring(img, sep=' ')  # img_int.shape --> (9216,), good.
img_matrix = img_int.reshape(96,96)
img_matrix_flipped = np.fliplr(img_matrix) # img_matrix_flipped.shape --> (96,96), good
img_matrix_flipped_vector = img_matrix_flipped.reshape(1, 9216) # img_matrix_flipped_vector.shape --> (1, 9216), good
img_matrix_flipped_vector_str = str(img_matrix_flipped_vector) # len(img_matrix_flipped_vector_str) --> 44, NOT GOOD!!!

I am confused about why the len(img_matrix_flipped_vector_str) is 44. Shouldn't the string contain all the 9216 integers in it? Please kindly help!

user3768495
  • 4,077
  • 7
  • 32
  • 58
  • As far as I can tell, there is nothing wrong with your code. It's maybe a better idea to use the `tostring()` method on your array, to not get all the array symbols and linebreaks. – Dschoni Dec 02 '16 at 14:35

2 Answers2

1

Based on @Dschoni's answer, I figured that I shouldn't have used the str() method. Then I found another topic, which helped me find the solution:

img_matrix_flipped_vector = img_matrix_flipped.reshape(9216)
list = img_matrix_filpped_vector.tolist()
str_I_want = ' '.join([str(i) for i in list])
Community
  • 1
  • 1
user3768495
  • 4,077
  • 7
  • 32
  • 58
  • Just adding to that: Instead of itterating over a list, you can directly itterate over the flattened array to save memory. And depending on what string you call the join method, this is going to be the separator. – Dschoni Dec 06 '16 at 09:41
0

I just found out: The string() method on an array returns the printable string representation. If you print this string, you wil see numbers, probably shortened with something like '...' in the middle. To convert a numpy array into a string, use the tostring() or tobytes() method on the array. You also might want to do the reshape into a 1-dimensional array instead of a 2-D array, with one axis being of size 1 (array.reshape(9216) instead array.reshape(1,9216)), depending on what you are aiming for.

Dschoni
  • 3,714
  • 6
  • 45
  • 80
  • Hi @Dschoni, you are right, the ```str()``` method is the culprit! But ```tostring()``` or ```tobytes()``` does not give me what I want either... – user3768495 Dec 02 '16 at 17:48
  • I got a bunch of ```\x000\x0000``` when using ```tostring()``` or ```tobytes()```. – user3768495 Dec 02 '16 at 17:55