I am working with the sklearn digits dataset.
Each datapoint is a 8x8 image of a digit.
[[0,1,2,3, .... 62,63], # This row is one image
[0,1,2,3, .... 62,63], # 0-8 make up the first row of the image
... 1794 more times
[0,1,2,3, .... 62,63]]
I set up my dataframe as follows:
from sklearn import datasets
digits = datasets.load_digits()
df = pd.DataFrame(data = digits.data)
df['target'] = digits.target
I am trying to iterate over each image and calculate averages over subsets of rows and columns.
To iterate over each image I just do the following: df[[i for i in range(64)]]
Or if I want a random subset of 8 pixels I do the following df[[random.sample(range(0, 64), 8)]]
Those I can wrap my head around. I am struggling with trying to iterate over subsets of each image. How would I iterate over every row of each image individually?
I can select the first row of the first image like this: df.iloc[:1,0:8]
While this will select the first column of the first image: df.iloc[:8,:1]
Ideally, I would like to output this structure:
[[image_1_col_1_avg..... col8_avg, row1_avg ..... row8_avg],
[image_2_col_1_avg..... col8_avg, row1_avg ..... row8_avg],
....
[image_1797_col_1_avg..... col8_avg, row1_avg ..... row8_avg]]
Where I shrink the 8*8 grid from 0-63 into the averages for each row and column. So instead of having 64 data points for each image, I would only have 16.
I have searched for a while but I can't find much documentation or guide on how to iterate through subsets of a dataframe. Of what I have found I can't really understand it. Any insight, guidance, or explanation of how to iterate over subsets of a dataframe will be much appreciated.