Dynamically increase number of fields in numpy array

Question

With the help of "Complex matlab-like data structure in python (numpy/scipy)" I came up with:

s=(5,3)
a=np.zeros(s, dtype=[('Int1', int),
                     ('Int2', int),
                     ('Str1', '|S5')])

a[0,0]=(1,2,'abcde')
a[0,1]=((5,2,'fghij'),(7,9,'klmno'))

The problem is, that in some fields of my array a, just like in field a[0,1], I want to add one or more extra "information" just like in my code example. I don't know how many extra information I have to write into which part of my matrix, but I will always have to write tuples with the dtype=[(int, int, string)].

Of course, I get an error when I try to write into a[0,1] the way I do.

I would like to keep my matrix a 2-dimensional, but I would like to write several instances of my dtype=[int, int, str] into one field, similar to what I tried in field a[0,1].

Hopefully, I could explain my problem in a comprehensible way.

Do you need to have these extras infos "nested" on the same row of the matrix ? Coulnd't you do something like `a[0,0]=(1,2,'abcde')`; `a[0,1]=(5,2,'fghij')`;`a[0,2]=(7,9,'klmno')` and know which row are linked (using a *dict*, or maybe an 4th element in your tuple as an identifier ?). Alternatively you can probably complete with 0/null values, like this : `a[1]=[(5,2,'fghij'),(7,9,'klmno'), (0, 0, None)]` — mgc, Jan 06 '16 at 19:17
Hi, thank you for the question and suggestion! -- I could also use an identifier or another variable to reference. in case I use another cross-reference, I wouldn't need a 2-dimensional matrix anymore. I will have to write the fields of a[x,y] into a excel sheet, which is why I would prefer a 2-dimensional solution. — Chris0304, Jan 06 '16 at 19:43
*"I will have to write the fields of a[x,y] into a excel sheet, which is why I would prefer a 2-dimensional solution"* - could you explain what you mean by this? I don't see how a 2D array would help in this case. Your data is still fundamentally "3D" in the sense that each individual "element" in the 2D array contains multiple values. Since a single cell can't contain two integers and a string, there's still no obvious way to represent `a` in a single 2D spreadsheet. — ali_m, Jan 06 '16 at 19:59
hi ali_m, I agree with you, there is not obvious way to represent a in a single 2D spreadsheet, since it has some 3D characteristics... But I have to fit it in a Excel sheet in this non-intuitive way: All the entries of for example a[0,1] will be in one excel field, and in this excel sheet I write them into a new line inside the same field... Thats how they want it! — Chris0304, Jan 07 '16 at 09:21
Have you looked at `pandas.DataFrame`? I think it is more suited to this sort of structure, especially if you want to convert it to an excel spreadsheet — TheBlackCat, Jan 07 '16 at 12:20
@TheBlackCat, I haven't looked at it yet, but I will. Thanks for the tip. — Chris0304, Jan 07 '16 at 14:23

hpaulj · Answer 1 · 2016-01-06T20:57:23.107

A numpy array is probably the wrong data structure for this kind of flexibility. Once created your array a takes up a fixed amount of memory. It has 15 (5*3) records, and each record contains the 2 ints and one string with 5 characters. You can modify values, but you can't add new records, or change one record into a composite of two records.

Lists give you the flexibility to add elements and to change their nature. A list contains pointers to objects located else where in memory.

An array of dtype=object behaves much like a list. Its data buffer is the same sort of pointers. a=np.zeros((3,5), dtype=object) is a 2d array, where each element can be a tuple, list, number, None, tuple of tuples, etc. But with that kind of array you loose a lot of the 2d numeric calculation abilities.

With your structured array, the only way to increase its size or add fields is to make a new array and copy data over. There are functions that assist in adding fields, but they do, in one way or other, what I just described.

With your definition, there are 3 fields, ['Int1','Int2','Str1']

a=np.zeros(s, dtype=[('Int1', int),
                     ('Int2', int),
                     ('Str1', '|S5')])

Increasing the number of fields (by that concept of fields) would be something like

a1=np.zeros(s, dtype=[('Int1', int),
                     ('Int2', int),
                     ('Str1', '|S5'),
                     ('Str2', '|S5')])

That is adding a field named 'Str2'. You could fill it with

for name in a.dtype.fields: a1[name] = a[name]

Now all records in a a2 have the same data as in a, but they also have a blank Str2 field. You could set that field for each element individually, or as group with:

a['Str2'] = ...

But your attempt to change A[0,1] into a tuple of tuples is quite different. It's like trying to replace an element of a regular numeric array with two numbers:

x = np.arange(10)
x[3] = [3,5]

That works for lists, x=range(10), but not for arrays.

score 0 · Answer 2 · answered Jan 08 '16 at 12:08

0

My code would look like this now:

s=(5,3)   
a=np.zeros(s, dtype=object)  
a[0,0]=(1,2,'abcde')  
a[0,1]=((5,2,'fghij'),(7,9,'klmno'))

I can see/access the entries with:

print(a[0,1])
print(a[0,1][0])
print(a[0,1][1])

answered Jan 08 '16 at 12:08

Chris0304

11
2

Dynamically increase number of fields in numpy array

2 Answers2