0

I'm trying to ajust values of images in a pandas dataframe Each row of the dataframe (images) holds an image of shape (7,7,3), 7x7 pixels and 3 colours. So when I try to adjust the top left pixel of the first image like so:

All other images (rows) are affected as well.

print(images.loc[0,'image'][0][0], images.loc[1,'image'][0][0])
images.loc[0,'image'][0][0]=[1,2,3]
print(images.loc[0,'image'][0][0], images.loc[1,'image'][0][0])

[0,0,0] [0,0,0]    
[1,2,3] [1,2,3]

This only happens when I adjust a single pixel. If I edit the image in its entirety, the other images/rows are not affected.

images[0,'image']=[image]

does work properly

added mvce:

import numpy as np
import pandas as pd

images = pd.DataFrame(columns=['image'])
image = np.zeros([2, 2, 2])
images.loc[0, 'image'] = image
images = pd.concat([images] * 2)
images = images.reset_index(drop=True)
print(images.loc[0, 'image'][0][0], '\n')
images.loc[0, 'image'][0][0] = [1, 1]
print(images.loc[0, 'image'][0][0], images.loc[1, 'image'][0][0])
AMC
  • 2,642
  • 7
  • 13
  • 35

1 Answers1

0

The problem is in the lines

image=np.zeros([2,2,2])

and

images=pd.concat([images]*2)

You create a single numpy object. This object is referenced twice in the final dataframe. To illustrate, if you explicitly make a copy of the object, the problem disappears:

import copy
images=pd.DataFrame(columns=['image'])
image=np.zeros([2,2,2])
images.loc[0,'image']=image
images=pd.concat([copy.deepcopy(images), copy.deepcopy(images)]) # explicitly duplicate the object to avoid reference to the same object
images=images.reset_index(drop=True)
print(images.loc[0,'image'][0][0],'\n')
images.loc[0,'image'][0][0]=[1,1]
print(images.loc[0,'image'][0][0],images.loc[1,'image'][0][0])

edit: to adress your comment, how to create many copies, you could try:

images = [np.zeros([2,2,2]) for lv in range(10000)] # create list containing independent instances of numpy arrays
images = pd.Series(images, index = range(10000))
images = images.to_frame('images')
images # should now be a dataframe containing independent numpy arrays in its 'image' column.
Arco Bast
  • 3,595
  • 2
  • 26
  • 53
  • This still doesn't solve it for me, I still get [1,1] [1,1] as output – stacking upon stackings Feb 09 '20 at 19:56
  • Interesting, I do not. Maybe deepcopy works differently for different python versions. Have a look at the last solution where the dataframe is constructed out of numpy arrays that are independently generated in the first place. – Arco Bast Feb 09 '20 at 20:11
  • The edit method does work for me, thank you very much. – stacking upon stackings Feb 09 '20 at 20:28
  • Happy to hear that. Your mcve made it possible to help. Generally, you might want to reconsider your data structure. Having numpy arrays as objects in a pandas dataframe is unusual and can cause problems, as pandas often tries to align the array with the dimensions of the dataframe itself. Why not using a 4 dimensional numpy array? – Arco Bast Feb 09 '20 at 20:35
  • I need to transform some data into images so that I can use a convolutional network for classification. – stacking upon stackings Feb 09 '20 at 21:15