0

I'm trying to sum some values in a list so i loaded the .dat file that contains the values, but the only way Python makes the sum of the data is by separate it with ','. Now, this is what I get.

    altura = np.loadtxt("bio.dat",delimiter=',',usecols=(5,),dtype='float')
  File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 846, in loadtxt
    vals = [vals[i] for i in usecols]
IndexError: list index out of range

This is my code

import numpy as np

altura = np.loadtxt("bio.dat",delimiter=',',usecols=(5,),dtype='str')
print altura

And this is the file 'bio.dat'

1 Katherine Oquendo M 18    1.50    50  
2 Pablo Restrepo    H 20    1.83    79  
3 Ana Agudelo   M 18    1.58    45  
4 Kevin Vargas  H 20    1.74    80  
5 Pablo madrid  H 20    1.70    55  

What I intend to do is

x=sum(altura)

What should i do with the 'separate'?

  • 1
    Your file *isn't* comma-separated, what are you expecting `numpy` to do?! – jonrsharpe Sep 13 '14 at 23:00
  • Exacltly! it isn't comma-separated, i want it to be comma-separa – Vanesa Piedrahita Sep 13 '14 at 23:21
  • I do not understand what you are asking. Do you want to change the file? You have to tell `loadtxt` what delimiter is actually being used, once the data is imported from the file it will be in an array, which you can certainly `sum`. Could you please provide a [minimal example](http://stackoverflow.com/help/mcve) that recreates the errors you see. – jonrsharpe Sep 14 '14 at 07:11
  • This is what i have using altura = np.loadtxt("bio.dat",delimiter=',',usecols=(5,),dtype='str') -> ([ 1.5 1.83 1.58 1.74 1.7 ]) But i need this ([ 1.5 , 1.83, 1.58, 1.74, 1.7 ]) (with comma-separated) – Vanesa Piedrahita Sep 14 '14 at 11:26
  • No, you don't; that is just how the array is shown. The real question is: why on earth are you using `dtype=str` when you want numbers?! – jonrsharpe Sep 14 '14 at 11:30
  • Sorry, i copied wrong that part, is 'float'. If I dont use the comma-separated the sum is 0. I simulated the list with Ipython with and without comma-separated and it gave me the right answer with the comma-separated – Vanesa Piedrahita Sep 14 '14 at 11:36
  • http://stackoverflow.com/q/16423774/3001761 – jonrsharpe Sep 14 '14 at 11:37

3 Answers3

1

In my case, some line includes # character.
Then numpy will ignore all the rests of the line, because that means ‘comment’.
So try again with comments parameter like

altura = np.loadtxt("bio.dat",delimiter=',',usecols=(5,),dtype=‘str’,comments=‘')

And I recommend you not to use np.loadtxt. Because it’s incredibly slow if you must process a large(>1M lines) file.

plhn
  • 5,017
  • 4
  • 47
  • 47
0

The file doesn't need to be comma separated. Here's my sample run, using StringIO to simulate a file. I assume you want to sum the numbers that look a person's height (in meters).

In [17]: from StringIO import StringIO
In [18]: s="""\
1 Katherine Oquendo M 18    1.50    50  
2 Pablo Restrepo    H 20    1.83    79  
3 Ana Agudelo   M 18    1.58    45  
4 Kevin Vargas  H 20    1.74    80  
5 Pablo madrid  H 20    1.70    55  
"""
In [19]: S=StringIO(s)
In [20]: data=np.loadtxt(S,dtype=float,usecols=(5,))
In [21]: data
Out[21]: array([ 1.5 ,  1.83,  1.58,  1.74,  1.7 ])
In [22]: np.sum(data)
Out[22]: 8.3499999999999996

as script (with the data in a .txt file)

import numpy as np
fname = 'stack25828405.txt'
data=np.loadtxt(fname,dtype=float,usecols=(5,))
print data
print np.sum(data)

2119:~/mypy$ python2.7 stack25828405.py
[ 1.5   1.83  1.58  1.74  1.7 ]
8.35
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Hi, thanks for your answer. Look, the reason why i'm trying to separate each value with comma is because of this Values for 'altura': [ 1.5 1.83 1.58 1.74 1.7 1.62 1.74 1.7 1.81 1.82 1.7 1.78 1.7 1.78 1.74 1.66 1.64 1.76 1.7 1.67 1.68 1.8 1.74 1.92 1.66] Sum of the values: 0 As you can see the terms are only separated with blanks, If I put the commas then the sum will give the result i need – Vanesa Piedrahita Sep 14 '14 at 01:20
  • But I get `array([ 1.5 , 1.83, 1.58, 1.74, 1.7 ])` from your sample. Are you using `np.sum`? What is `altura`? Is it a numpy array of floats? – hpaulj Sep 14 '14 at 01:32
  • Yes, i am. But what i'm doing to run it is 'python datosbiometricos.py' and there is when i get the error Are you using Ipython to run it? And yes, altura is an array of floats – Vanesa Piedrahita Sep 14 '14 at 01:35
  • 1
    I added a script run. That numbers are the same. Display is different, but that's not significant. I wonder if your sum is 0 because `altura` is empty. What is `altura.shape`? – hpaulj Sep 15 '14 at 20:41
0

Alternatively, you can convert the tab delimited file to csv first.

csv supports tab delimited files. Supply the delimiter argument to reader:

import csv

txt_file = r"mytxt.txt"
csv_file = r"mycsv.csv"

# use 'with' if the program isn't going to immediately terminate
# so you don't leave files open
# the 'b' is necessary on Windows
# it prevents \x1a, Ctrl-z, from ending the stream prematurely
# and also stops Python converting to / from different line terminators
# On other platforms, it has no effect
in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))

out_csv.writerows(in_txt)

This answer is not my work; it is the work of agf found at https://stackoverflow.com/a/10220428/3767980.

Community
  • 1
  • 1
PhysicalChemist
  • 540
  • 4
  • 14