3

Can someone find out what is wrong with the code below?

import numpy as np
data = np.recfromcsv("data.txt", delimiter=" ", names=['name', 'types', 'value'])
indices = np.where((data.name == 'david') * data.types.startswith('height'))
mean_value = np.mean(data.value[indices])

I want to calculate mean of weight and height for david and mark as follows:

david>> mean(weight_2005 and weight_2012), mean (height_2005 and height_2012)
mark>> mean(weight_2005 and weight_2012), mean (height_2005 and height_2012)

From the text (data.txt) file:

david weight_2005 50
david weight_2012 60
david height_2005 150
david height_2012 160
mark weight_2005 90
mark weight_2012 85
mark height_2005 160
mark height_2012 170

I am using python 3.2 and numpy 1.8

The above code provides the type error as follows:

TypeError: startswith first arg must be bytes or a tuple of bytes, not numpy.str_
Charles
  • 50,943
  • 13
  • 104
  • 142
2964502
  • 4,301
  • 12
  • 35
  • 55
  • 1
    The code at the top works for me. `mean_value` is `155.0`, with python 2, numpy 1.7 – askewchan Nov 13 '13 at 03:05
  • @askewchan which verson of python and numpy are you using? – 2964502 Nov 13 '13 at 03:06
  • 1
    I can reproduce the error message in python 3.3 and `numpy` 1.9.0.dev-8a2728c. Does `data.types.astype(str).startswith("height")` work? (If so, we should probably figure out what the appropriate idiom to decode is.) – DSM Nov 13 '13 at 03:07
  • @DSM nope, RuntimeWarning: invalid value encountered in double_scalars nan – 2964502 Nov 13 '13 at 03:11
  • 1
    Well, that makes it clear what the problem is. But that's not the best solution, because we should explicitly decode the bytes into strings and use free functions instead. Maybe there's an option to pass to `recfromcsv` to do the decoding at that point. Otherwise we should probably call `decode` manually. In any case, we should probably be using `np.char.startswith`. – DSM Nov 13 '13 at 03:20

1 Answers1

1

With Python3.2 and numpy 1.7, this line works

indices = np.where((data.name == b'david') * data.types.startswith(b'height'))

data displays as:

rec.array([(b'david', b'weight_2005', 50),...], 
      dtype=[('name', 'S5'), ('types', 'S11'), ('value', '<i4')])

type(data.name[0]) is <class 'bytes'>.

b'height' works in Python2.7 as well.


another option is to convert all the data to unicode (Python 3 strings)

dtype=[('name','U5'), ('types', 'U11'), ('value', '<i4')]
dataU=data.astype(dtype=dtype)
indices = np.where((dataU.name == 'david') * dataU.types.startswith('height'))

or

data = np.recfromtxt('data.txt', delimiter=" ", 
    names=['name', 'types', 'value'], dtype=dtype)

It looks like recfromcsv does not take a dtype, but recfromtxt does.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • There is a bug report and patch for the fact that `recfromcsv` does not take `dtype`: https://github.com/numpy/numpy/issues/311 – hpaulj Nov 24 '13 at 23:29