-1

I have a big .txt data file with the format like (all numbers):

0 1.2 2 3.1
20 21.2 22 23
30 31 32 33.01

I want to import the 2nd column, for example, of this matrix into a variable. Here is the code that I have wrote. First, I import all the matrix into a variable as string and convert it into an array, named "data" here. Then, want to access data[1][1] or data[:][1] But, it gives the following error

IndexError: too many indices for array

Any idea what is the error or what is the efficient way of doing the job? Many thanks

import csv
data=[]
with open('test.txt', 'r') as f:
  reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
   for line in f:
      if line.endswith('\n'):
          line=line[:-1] 

      data=np.asarray(line)  
      print(data)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
Yousef Vh
  • 3
  • 1
  • 3

2 Answers2

1

Classic way would be like:

with open('test.txt') as txt:
    array2d = [[float(digit) for digit in line.split()] for line in txt]

print(array2d[:][0])

Reference: List Comprehensions

I think performance wise numpy should be faster :/

Update:

For numpy you could use the loadtxt function.

import numpy as np
textfile = np.loadtxt("test.txt")
print(textfile[0][0])

Reference: Reading and Writing Data Files

Update2: IndexError: too many indices for array

import csv 
import numpy as np #missing
data = [] #create a empty array
with open('test.txt', 'r') as f: #opens the textfile in readmode and stores in f
    reader = csv.reader(f, delimiter=' ', skipinitialspace=True) #creating a reader instance but never used
    for line in f: #loop for each line in file
            if line.endswith('\n'): #if the file ends with a new line
                line = line[:-1] #set to last line

            data = np.asarray(line) # here is one bigger mistake you overwrite the data array with just one line
            print(data) #print out this one line

So you only get the last line stored in your data array.

Note: There is no way to determine the lenght of a line, so you have to read in the file. You cant jump to this specific line, there some ways to improve the performance so please let us know how big your file is or what the expected speed looks like.

Update3: Get the Column

import numpy as np
textfile = np.loadtxt("test.txt")
print(textfile[:,0])

Reference: Numpy Indexing

Fabian
  • 1,130
  • 9
  • 25
  • Thanks, this works. But I wonder what is the problem with the way I did? and secondly, since, I need only the 2nd column, for example, I want to save only that column which I think can be the efficient way. However, this way or what I have done at the top in my code, imports all the .txt data and saves them in a variable. While, other columns, that are saved, too, are useless. – Yousef Vh Dec 25 '18 at 23:10
  • @YousefVh i think the second update is more specific on your code ? – Fabian Dec 25 '18 at 23:28
  • yes, the second update works, too. However, this solution also does not gives value of a column, but gives values of a row!! In my code, the line data=np.asarray(line) seems to be correct because print(data) shows all the contents of the .txt file not one line. – Yousef Vh Dec 25 '18 at 23:40
0

Reading the file line by line and saving each in a list is equivalent to making a list of strings:

In [98]: txt='''0 1.2 2 3.1
    ...: 20 21.2 22 23
    ...: 30 31 32 33.01'''.splitlines()
In [99]: txt
Out[99]: ['0 1.2 2 3.1', '20 21.2 22 23', '30 31 32 33.01']

Making an array from that just produces a 1d array of strings. That can't be indexed as a 2d numeric array:

In [100]: np.array(txt)
Out[100]: array(['0 1.2 2 3.1', '20 21.2 22 23', '30 31 32 33.01'], dtype='<U14')

If you first split the lines into substrings:

In [101]: [line.split() for line in txt]
Out[101]: 
[['0', '1.2', '2', '3.1'],
 ['20', '21.2', '22', '23'],
 ['30', '31', '32', '33.01']]
In [102]: np.array([line.split() for line in txt], dtype=float)
Out[102]: 
array([[ 0.  ,  1.2 ,  2.  ,  3.1 ],
       [20.  , 21.2 , 22.  , 23.  ],
       [30.  , 31.  , 32.  , 33.01]])

This is a 2d array.

We can select a column from that array. But note that the result is a 1d array:

In [104]: np.array([line.split() for line in txt], dtype=float)[:,1]
Out[104]: array([ 1.2, 21.2, 31. ])

Don't worry about this being a 'row' or 'column'. We could change shape to (1,3) or (3,1), but for most numpy purposes a 1d shape, (3,) is just as good.

numpy has good csv loader (two actually):

In [105]: np.genfromtxt(txt)
Out[105]: 
array([[ 0.  ,  1.2 ,  2.  ,  3.1 ],
       [20.  , 21.2 , 22.  , 23.  ],
       [30.  , 31.  , 32.  , 33.01]])

Usually genfromtxt is given a file name, but it works fine anything that feeds it lines, such as the txt list.

And if you are interested in just one column, you can specify it with usecols (there are many more possible parameters):

In [106]: np.genfromtxt(txt, usecols=1)
Out[106]: array([ 1.2, 21.2, 31. ])

genfromtxt isn't the fastest loader. It you need more speed we usually recommend the pandas loaded. pandas has some fancier handling of quotes and missing values, but you don't seem to need that here. numpy users don't seem to use the csv module much; it may be it just isn't needed.


If you really must have a column vector, here's how to use reshape:

In [110]: col1 = np.genfromtxt(txt, usecols=1)
In [111]: col1
Out[111]: array([ 1.2, 21.2, 31. ])
In [112]: col1.reshape(3,1)
Out[112]: 
array([[ 1.2],
       [21.2],
       [31. ]])

Let's build the array line by line:

In [116]: data = []
In [117]: for line in txt:
     ...:     arr = np.array(line.split(), dtype=float)
     ...:     print(arr.shape)
     ...:     data.append(arr)
     ...:     print(data)
     ...:     
     ...:     
(4,)
[array([0. , 1.2, 2. , 3.1])]
(4,)
[array([0. , 1.2, 2. , 3.1]), array([20. , 21.2, 22. , 23. ])]
(4,)
[array([0. , 1.2, 2. , 3.1]), array([20. , 21.2, 22. , 23. ]), array([30.  , 31.  , 32.  , 33.01])]

data is now a list of arrays:

In [118]: data
Out[118]: 
[array([0. , 1.2, 2. , 3.1]),
 array([20. , 21.2, 22. , 23. ]),
 array([30.  , 31.  , 32.  , 33.01])]

Join those arrays into one array:

In [119]: np.array(data)
Out[119]: 
array([[ 0.  ,  1.2 ,  2.  ,  3.1 ],
       [20.  , 21.2 , 22.  , 23.  ],
       [30.  , 31.  , 32.  , 33.01]])

(Check the shape after doing this. If the individual arrays differ in shape, the result will not be a 2d array; it will be a 1d array of arrays, which is closer to the original list of arrays.)

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Great explanation. thanks. I fully understood what is going on. But, There is one thing that is not clear to me and it is in the above code of mine that when I do print(data) I see the content of the .txt file. But, when I try even data[0] or data[0][0], it gives error, while I have used the "np.asarray" command to convert list into array. – Yousef Vh Dec 26 '18 at 00:32
  • `data = np.array(line)` assigns the array object to the `data` variable, replacing the `[]` list you initially assigned to it, and replacing any previous line assignment. To collect objects in a list, you need to `data.append(line)`. You may need to practice with some simple list construction loops. – hpaulj Dec 26 '18 at 01:03
  • I edited my answer to illustrate building the array line by line. – hpaulj Dec 26 '18 at 01:09