0

I have the following data in geo.dat

id  lon  lat inhab  name
 1   9.  45.   100  Ciriè
 2  10.  45.    60  Acquanegra

and I get it in a ndarray

import numpy as np
data = np.genfromtxt('geo.dat', dtype=None, names=True)

so far, so good, I have a data structure that I can address by column name

print(data['name'][1]) #>>> Acquanegra

Next step, and question — I have a function that takes in input two vectors of geographical coordinates (data['LON'] and data['LAT'] of course) and returns two arrays x and y of projected positions on a map (this works ok).

I can live with separate vectors x and y but I'd like to augment data with two new columns, data['x'] and data['y']. My naive attempt

data['x'], data['y'] = convert(data['LON'], data['LAT'])

raised a ValueError: no field of name x, teaching me that data has some traits of a dictionary but a dictionary is not.

Is it possible to do as I want? tia

Please consider that .hstack() doesn't work with structured arrays aka record arrays, most previous answers work only for homogeneous arrays (the exception is mentioned in below comment by Warren).


PS I'd prefer not to pandas.

gboffi
  • 22,939
  • 8
  • 54
  • 85
  • See https://stackoverflow.com/questions/25427197/numpy-add-column-to-existing-array/25429497#25429497 – Warren Weckesser Jul 04 '17 at 13:19
  • @WarrenWeckesser AH! I suspected that the key was to manipulate the `.dtype` of the array but I would had never devised all the steps involved... I've upvoted your answer of course. May I ask you if my title "structured array" is terminologically correct? If yes, I'd like to answer my question summarizing your answer and giving a link to it because I feel that the title of the question you answered is a bit generic. – gboffi Jul 04 '17 at 13:28
  • But then you would be creating a duplicate question, and stackoverflow frowns on that. It would be better to edit the title of the other question. In fact, I'll do that right now... – Warren Weckesser Jul 04 '17 at 13:31
  • The key point about `hstack` or other `concatenate` functions is that dtype fields are not an axis (even though there some similarities in data layout). `reshape` also doesn't work across that axis/field boundary. – hpaulj Jul 04 '17 at 16:45

1 Answers1

5

You can use np.lib.recfunctions:

import numpy.lib.recfunctions as rfn

data = rfn.append_fields(data, ['x', 'y'], [x, y])
Eric
  • 95,302
  • 53
  • 242
  • 374
  • Works. Sort of. The array returned by `rfn.append_fields()` is a collection of _masked_ arrays, while previously both the columns of `data` and the arrays returned by `transform()` were non-masked arrays. Could you mention in your answer the origin and the (possible) implications of this unexpected nehaviour? —— Further, you may want to add your answer to the question mentioned by Warren in a comment to my question, because I feel that my question is going to be closed... – gboffi Jul 04 '17 at 13:55
  • To comment on my comment, the optional argument `usemask=False` could take care of my issue... – gboffi Jul 04 '17 at 14:34
  • `rfn.append_fields` performs the same kind of action as `@Warren's` link - make a new array of desired dtype and size and copy fields by name. It is more general in that it allows for missing data, can make `masked_arrays` and can make `recarrays`. – hpaulj Jul 04 '17 at 16:31
  • `recfunctions` are a bit buggy, e.g. https://stackoverflow.com/questions/42364725/numpy-recarray-append-fields-cant-append-numpy-array-of-datetimes; https://stackoverflow.com/questions/44769632/numpy-recfunctions-join-by-typeerror. They aren't heavily used, and interact with other array subclasses. – hpaulj Jul 04 '17 at 16:35