3

I have a list tsv file which I am parsing and want to convert it into an array.

Here is the file format -

jobname1 queue maphours reducehours
jobname2 queue maphours reducehours

code

with open(file.tsv) as tsv:
    line = [elem.strip().split('\t') for elem in tsv]
    vals = np.asarray(line)
    print vals[0]
    print vals[4]

Vals currently returns the following output -

['job1', 'queue', '1.0', '0.0\n']
['job2', 'queue', '1.0', '0.0\n']

I want to convert each element in a row in the entire file to an array object -

vals[0] = job1 vals[1] = queue vals[2] = 1.0 vals[3] = 0.0 

How do i achieve this?

xyzzz
  • 1,463
  • 5
  • 18
  • 28

2 Answers2

3

From what I understand you would like to create 2D array in numpy where each row of the file is a row corresponds to the created array, and column in a file is a column in the array. If so, you could do this as follows:

For example, if your data file is:

jobname1    queue   1   3
jobname2    queue   2   4
jobname41   queue   1   1
jobname32   queue   2   2
jobname21   queue   3   4
jobname12   queue   1   6

The following code:

with open(file) as tsv:
    line = [elem.strip().split('\t') for elem in tsv]

vals = np.asarray(line) 

will result in the following vals array:

[['jobname1' 'queue' '1' '3']
 ['jobname2' 'queue' '2' '4']
 ['jobname41' 'queue' '1' '1']
 ['jobname32' 'queue' '2' '2']
 ['jobname21' 'queue' '3' '4']
 ['jobname12' 'queue' '1' '6']]

The get the job names you can do:

print(vals[:,0])
% gives ['jobname1' 'jobname2' 'jobname41' 'jobname32' 'jobname21' 'jobname12']

Or if you want rows containing some job, you can do:

print(vals[np.apply_along_axis(lambda row: row[0] == 'jobname1', 1, vals)])
Marcin
  • 215,873
  • 14
  • 235
  • 294
  • I changed the code and use this `print vals[0][0] print [0][1]` it gives the following error `IndexError: index 1 is out of bounds for axis 0 with size 1` – xyzzz Jun 09 '14 at 01:17
  • I am looking for vals[0][0].. etc etc values because i am doing a backend mysql insert based on these values. the print (vals[:,0]) does not work and returns an error "too may indices" - @Marcin – xyzzz Jun 09 '14 at 02:23
  • @rond your `vals` is `numpy.ndarray` or regular python list? – Marcin Jun 09 '14 at 02:28
  • I got it...I just printed the 2d array elements and it is parsing it fine! print vals [0][0] print vals [0][1] - works fine. Thanks @Marcin – xyzzz Jun 09 '14 at 02:32
2

Are you sure you need an array? @Marcin's answer is more complete if you want a Numpy array.

Python doesn't have an array data structure (there's a list of Python data structures here). There is a "thin wrapper around the C array". In order to use the wrapper around the C array, you have to specify a type that the array will hold (here you'll find a list of typecodes, at the top, and examples at the bottom):

If you want to use a numpy array, this should work:

import numpy as np
myarray = np.asarray(yourList)

adopted from here.

rofls
  • 4,993
  • 3
  • 27
  • 37