I have a complicated set of data that I have to do distance calculations on. Each record in the data set contains many different data types so a record array or structured array appears to be the way to go. The problem is when I have to do my distance calculations, the scipy spatial distance functions take arrays and the recored array is numpy voids. How to I make a recored array of numpy arrays instead of numpy voids? Below is a very simple example of what I'm talking about.
import numpy
import scipy.spatial.distance as scidist
input_data = [
('340.9', '7548.2', '1192.4', 'set001.txt'),
('546.7', '9039.9', '5546.1', 'set002.txt'),
('456.3', '2234.8', '2198.8', 'set003.txt'),
('332.1', '1144.2', '2344.5', 'set004.txt'),
]
record_array = numpy.array(input_data,
dtype=[('d1', 'float64'), ('d2', 'float64'), ('d3', 'float64'), ('file', '|S20')])
The following code fails...
this_fails_and_makes_me_cry = record_array[['d1', 'd2', 'd3']]
scidist.pdist(this_fails_and_makes_me_cry)
I get this error....
Traceback (most recent call last):
File "/home/someguy/working_datasets/trial003/scrap.py", line 16, in <module>
scidist.pdist(record_array[['d1', 'd2', 'd3']])
File "/usr/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 1093, in pdist
raise ValueError('A 2-dimensional array must be passed.');
ValueError: A 2-dimensional array must be passed.
The error occurs because this_fails_and_makes_me_cry is an array of numpy.voids. To get it to work I have to convert each time like this...
this_works = numpy.array(map(list, record_array[['d1', 'd2', 'd3']]))
scidist.pdist(this_works)
Is it possible to create a record array of numpy arrays to begin with? Or is a numpy record/structured array restricted to numpy voids? It would be handy for the record array to contain the data in a format compatible with scipy's spatial distance functions so that I don't have to convert each time. Is this possible?