0

I have a list A of the form:

A = ['P', 'Q', 'R', 'S', 'T', 'U']

and an array B of the form:

B = [[ 1  2  3  4  5  6]
     [ 7  8  9 10 11 12]
     [13 14 15 16 17 18]
     [19 20 21 22 23 24]]

now I would like to create a structured array C of the form:

C = [[ P  Q  R  S  T  U]
     [ 1  2  3  4  5  6]
     [ 7  8  9 10 11 12]
     [13 14 15 16 17 18]
     [19 20 21 22 23 24]]

so that I can extract columns with column names P, Q, R, etc. I tried the following code but it does not create a structured array and gives the following error.

Code

import numpy as np
A = (['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]])
C = np.vstack((A, B))
print (C)
D = C['P']

Error

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

How to create structured array in Python in this case?

Update

Both are variables, their shape changes during runtime but both list and array will have the same number of columns.

nxcr
  • 139
  • 2
  • 12

3 Answers3

2

If you want to do it in pure numpy you can do

A = np.array(['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[ 1,  2,  3,  4,  5,  6],
              [ 7,  8,  9, 10, 11, 12],
              [13, 14, 15, 16, 17, 18],
              [19, 20, 21, 22, 23, 24]])

# define the structured array with the names from A
C = np.zeros(B.shape[0],dtype={'names':A,'formats':['f8','f8','f8','f8','f8','f8']})

# copy the data from B into C
for i,n in enumerate(A):
    C[n] = B[:,i]

C['Q']
array([  2.,   8.,  14.,  20.])

Edit: you can automatize the format list by using instead

C = np.zeros(B.shape[0],dtype={'names':A,'formats':['f8' for x in range(A.shape[0])]})

Furthermore, the names do not appear in C as data but in dtype. In order to get the names from C you can use

C.dtype.names
plonser
  • 3,323
  • 2
  • 18
  • 22
  • Do I have to add formats for each columns, any other workaround for large number of columns? – nxcr Apr 04 '15 at 14:14
  • @nxcr : I think you need the `formats` part but you can automatize it -> see my edit – plonser Apr 04 '15 at 14:28
  • The headers are missing in C. How to keep the headers (i.e. array A) in array C? – nxcr Apr 04 '15 at 15:02
  • @nxcr : Because the names are not part of the data. For a list of the names type `C.dtype.names`. You can also see them in `dtype` when you just type `C`. When you want to have something like an Excel 'view' you have to use `pandas` instead – plonser Apr 04 '15 at 15:06
  • But when I generate arrays using np.genfromtxt() I get such headers. – nxcr Apr 04 '15 at 15:08
  • If the first line in a structured array consists of `string` elements I would expect that all the data are also stored as `strings`. Are you sure the number below the names appear as `floats`? However, `names` in structured `numpy.arrays` do not appear in the data but in `dtype` as far as I know. If you don't like it you have to use `pandas` as shown in the other answer. – plonser Apr 04 '15 at 15:15
  • Is it possible to make C a 2D array? – nxcr Apr 07 '15 at 15:18
1

This is what the pandas library is for:

>>> A = ['P', 'Q', 'R', 'S', 'T', 'U']
>>> B = np.arange(1, 25).reshape(4, 6)
>>> B
array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17, 18],
       [19, 20, 21, 22, 23, 24]])
>>> import pandas as pd
>>> pd.DataFrame(B, columns=A)
    P   Q   R   S   T   U
0   1   2   3   4   5   6
1   7   8   9  10  11  12
2  13  14  15  16  17  18
3  19  20  21  22  23  24
>>> df = pd.DataFrame(B, columns=A)
>>> df['P']
0     1
1     7
2    13
3    19
Name: P, dtype: int64
>>> df['T']
0     5
1    11
2    17
3    23
Name: T, dtype: int64
>>>
YXD
  • 31,741
  • 15
  • 75
  • 115
0

Your error occurs on:

D = C['P']

Here is a simple approach, using regular Python lists on the title row.

import numpy as np
A = (['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], 
    [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]])
C = np.vstack((A, B))
print (C)
D = C[0:len(C), list(C[0]).index('P')]
print (D)
xxyzzy
  • 580
  • 4
  • 7