1

I have a text file in the following format:

a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,

How can i read it into a list efficiently so as to get the following output?

list=[[1,4,1,6],[1,5,2,9],[2,6,5,8],[3,7,7,5]]
dkim
  • 3,930
  • 1
  • 33
  • 37
Jagannath Ks
  • 477
  • 2
  • 6
  • 7

3 Answers3

3

Let's assume that the file is named spam.txt:

$ cat spam.txt
a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,    

Using list comprehensions and the zip() built-in function, you can write a program such as:

>>> with open('spam.txt', 'r') as file:
...     file.readline() # skip the first line
...     rows = [[int(x) for x in line.split(',')[:-1]] for line in file]
...     cols = [list(col) for col in zip(*rows)]
... 
'a,b,c,d,\n'
>>> rows
[[1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]]
>>> cols
[[1, 4, 1, 6], [1, 5, 2, 9], [2, 6, 5, 8], [3, 7, 7, 5]]

Additionally, zip(*rows) is based on unpacking argument lists, which unpacks a list or tuple so that its elements can be passed as separate positional arguments to a function. In other words, zip(*rows) is reduced to zip([1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]).

EDIT:

This is a version based on NumPy for reference:

>>> import numpy as np
>>> with open('spam.txt', 'r') as file:
...     ncols = len(file.readline().split(',')) - 1
...     data = np.fromiter((int(v) for line in file for v in line.split(',')[:-1]), int, count=-1)
...     cols = data.reshape(data.size / ncols, ncols).transpose()
...
>>> cols
array([[1, 4, 1, 6],
       [1, 5, 2, 9],
       [2, 6, 5, 8],
       [3, 7, 7, 5]])
dkim
  • 3,930
  • 1
  • 33
  • 37
  • yes it's clear nice explanation... since i am dealing with large text files,size of the list "rows" or "cols" will be large and the RAM consumed for the above code is around 1.4 GB for 500 MB input file.is there any optimized way to do this..? – Jagannath Ks Aug 07 '12 at 06:41
  • @JagannathKs It depends on your goal. What are you going to do with the columns finally? – dkim Aug 07 '12 at 06:59
  • i will get 2 such columns for 2 different files and process them based on certain criteria....any way ill try to optimize it.thanks for your reply – Jagannath Ks Aug 07 '12 at 07:30
0

You can try the following code:

from numpy import*

x0 = []
for line in file('yourfile.txt'):
    line = line.split()
    x = line[1]
   x0.append(x)

for i in range(len(x0)):
print x0[i]

Here the first column is appended onto x0[]. You can append the other columns in a similar fashion.

Next Door Engineer
  • 2,818
  • 4
  • 20
  • 33
  • Why is `numpy` required here? – Kos Aug 07 '12 at 05:54
  • numpy contains a powerful N-dimensional array object and can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows numpy to seamlessly and speedily integrate with a wide variety of databases. – Next Door Engineer Aug 07 '12 at 05:56
  • 4
    Where is it used in your example? – Kos Aug 07 '12 at 06:24
0

You can use data_py package to read column wise data from a file. Install this package by using

pip install data-py==0.0.1

Example

from data_py import datafile
df1=datafile("C:/Folder/SubFolder/data-file-name.txt")
df1.separator=","
[Col1,Col2,Col3,Col4,Col5]=["","","","",""]
[Col1,Col2,Col3,Col4,Col5]=df1.read([Col1,Col2,Col3,Col4,Col5],lineNumber)
print(Col1,Col2,Col3,Col4,Col5)

For details please follow the link https://www.respt.in/p/python-package-datapy.html

ddeb
  • 1