Extract selected columns & row from numpy 3D matrix before saving

Question

As I am new to the whole "python" thing I face the following problems :

I have a data file .npy shaped (77,77,20). I want to extract from this files 25 rows & 25 columns and get a new matrix shaped like (25,25,20). The rows and columns aren't the first 25 nor the last 25. I created 2 variable "col_idx" & "row_idx" containing the number of the 25 rows & columns but I can't extract them from my data. Any suggestion on how to proceed ?
I want to save using numpy.savetxt this (25,25,20) matrix in csv so that it can be readable. I may have found someting tricky for this part on stackoverflow but as I just begin python I don't really understand it.

I would gladly take any advice on how to code this. Thanks !

jakevdp · Accepted Answer · 2017-05-06T14:38:30.577

1

For number 1, you can pass arrays of indices

# generate data, and a list of row and column indices
data = np.random.rand(77,77,20)
col_idx = np.random.randint(0, 77, 25)
row_idx = np.random.randint(0, 77, 25)

# extract the subset
subset = data[row_idx, col_idx[:, np.newaxis]]
print(subset.shape)
# (25, 25, 20)

The only tricky thing here is the np.newaxis thing. It's an example of NumPy's broadcasting, which is a set of rules to combine arrays of different shapes. Here shape (25,) array of indices combines with a shape (25, 1) array of indices to result in a (25, 25) grid of indices, which extract a (25, 25, 20) subset of the original array.

As far as saving to CSV, I find the tools provided by the pandas library to be most useful for this kind of thing. For 3D data, you can convert to a dataframe via a 3D panel, and save to CSV directly:

import pandas as pd
panel = pd.Panel(subset)
frame = panel.to_frame()
frame.to_csv('output.csv')

This results in a CSV with the row/column index as the first and second entry in each row. If you want your csv output in a different form, you can use the standard pandas index transformations (stack, unstack, reindex, etc.) before saving.

edited May 06 '17 at 14:38

answered May 06 '17 at 13:58

jakevdp

77,104
11
125
160

Thanks for your help ! I don't quite get the "np.random.randint" part ? When do I specify the place of the columns that I want to extract ? I don't want random columns from the file ? – Jrdnalvs May 06 '17 at 14:21
You didn't provide your data, so I generated random data & row/col indices similar to what you mentioned in the question. Use your list of row/col indices instead. – jakevdp May 06 '17 at 14:37
It returns the error : "TypeError: tuple indices must be integers, not tuple" – Jrdnalvs May 06 '17 at 15:09
It sounds like your row/column indices are tuples rather than arrays. First do ``row_idx = np.array(row_idx); col_idx = np.array(col_idx)`` – jakevdp May 06 '17 at 15:25
Okay so IT WORKS. Thank you so much for your help ! – Jrdnalvs May 06 '17 at 15:37

Extract selected columns & row from numpy 3D matrix before saving

1 Answers1