42

I have a list of Num_tuples tuples that all have the same length Dim_tuple

xlist = [tuple_1, tuple_2, ..., tuple_Num_tuples]

For definiteness, let's say Num_tuples=3 and Dim_tuple=2

xlist = [(1, 1.1), (2, 1.2), (3, 1.3)]

I want to convert xlist into a structured numpy array xarr using a user-provided list of column names user_names and a user-provided list of variable types user_types

user_names = [name_1, name_2, ..., name_Dim_tuple]
user_types = [type_1, type_2, ..., type_Dim_tuple]

So in the creation of the numpy array,

dtype = [(name_1,type_1), (name_2,type_2), ..., (name_Dim_tuple, type_Dim_tuple)]

In the case of my toy example desired end product would look something like:

xarr['name1']=np.array([1,2,3])
xarr['name2']=np.array([1.1,1.2,1.3])

How can I slice xlist to create xarr without any loops?

nucsit026
  • 652
  • 7
  • 16
aph
  • 1,765
  • 2
  • 19
  • 34
  • `without any loops` is this possible without loops? What about list comprehension? Also, have you tried anything? – Aleksander Lidtke Jan 27 '15 at 18:03
  • Yes, though the only thing I've gotten to work are hard-coding solutions that first involve xlist --> np.array(xlist). – aph Jan 27 '15 at 18:11
  • For example, xtemp = np.array(xlist), and x1=np.array(xtemp[:,1]), this creates a numpy array of one-element tuples, which is not what I want. I can't seem to get the slicing right, that's the entire problem. Should be simple, I realize. – aph Jan 27 '15 at 18:16

2 Answers2

47

A list of tuples is the correct way of providing data to a structured array:

In [273]: xlist = [(1, 1.1), (2, 1.2), (3, 1.3)]

In [274]: dt=np.dtype('int,float')

In [275]: np.array(xlist,dtype=dt)
Out[275]: 
array([(1, 1.1), (2, 1.2), (3, 1.3)], 
      dtype=[('f0', '<i4'), ('f1', '<f8')])

In [276]: xarr = np.array(xlist,dtype=dt)

In [277]: xarr['f0']
Out[277]: array([1, 2, 3])

In [278]: xarr['f1']
Out[278]: array([ 1.1,  1.2,  1.3])

or if the names are important:

In [280]: xarr.dtype.names=['name1','name2']

In [281]: xarr
Out[281]: 
array([(1, 1.1), (2, 1.2), (3, 1.3)], 
      dtype=[('name1', '<i4'), ('name2', '<f8')])

http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays

chrisaycock
  • 36,470
  • 14
  • 88
  • 125
hpaulj
  • 221,503
  • 14
  • 230
  • 353
-1

hpaulj's answer is interesting but horrifying :)

The modern Pythonic way to have named columns is to use pandas, a highly popular package built on top of numpy:

import pandas as pd

xlist = [(1, 1.1), (2, 1.2), (3, 1.3)]

# Cast name1 to int because pandas' default is float
df = pd.DataFrame(xlist, columns=['name1', 'name2']).astype({'name1':int})
print(df)

This gives you a DataFrame, df, which is the structure you want:

   name1  name2
0      1    1.1
1      2    1.2
2      3    1.3

You can do all kinds of wonderful things with this, like slicing and various operations.

For example, to the create the xarr dictionary requested in the original question:

>>> xarr = {k:np.array(v) for k,v in df.to_dict(orient='list').items()}
>>> xarr
{'name1': array([1, 2, 3]), 'name2': array([1.1, 1.2, 1.3])}
Michael Currie
  • 13,721
  • 9
  • 42
  • 58
  • This answer does not answer the original question, and suggests using a much larger and bloated package that can be avoided using the accepted answer. – fwyzard Aug 08 '23 at 12:32
  • OK fwyzard, good point; I have added a one-liner to convert to the exact format requested in the original question. Also, pandas' "bloat" could be considered a feature, not a bug: unless you have a very good reason, it is probably better to rely on a well-supported package rather than to try to "roll your own" solution to a basic data transformation task. – Michael Currie Aug 12 '23 at 10:33
  • Hi Michael, I do agree with reusing well-supported software. However `import pandas` takes more than 0.5 seconds on an dual AMD EPYC 7763 server... for a _simple_ transformation, that's likely more than the time taken by the operation itself :-( I'd be much happier with pandas if it could be used in a more modular way. – fwyzard Aug 30 '23 at 13:27