1

I tried to open a .txt file as an array in python, so I can operate on the elements within. The .txt file (abc.txt) looks something like this.

AL192012,               TONY,     20,
20121021, 1800,  , LO, 20.1N,  50.8W,  25, 1011,
20121022, 0000,  , LO, 20.4N,  51.2W,  25, 1011,
20121022, 0600,  , LO, 20.8N,  51.5W,  25, 1010,
20121022, 1200,  , LO, 21.3N,  51.7W,  30, 1009,
AL182012,              SANDY,     45,
20121021, 1800,  , LO, 14.3N,  77.4W,  25, 1006,
20121022, 0000,  , LO, 13.9N,  77.8W,  25, 1005,
20121022, 0600,  , LO, 13.5N,  78.2W,  25, 1003,
20121022, 1200,  , TD, 13.1N,  78.6W,  30, 1002,

I have tried pd.read_csv('abc.txt'), loadtxt("abc.txt") and genfromtxt("abc.txt"). But they only generated array with three columns, probably because the first row only had three columns. But I want it to have the same eight columns as the .txt file. Is this possible? Thanks!

poke
  • 369,085
  • 72
  • 557
  • 602
Ron
  • 57
  • 1
  • 7
  • Well, what do you expect those two lines that don’t have as many columns to appear at in the result? – poke Jan 22 '14 at 10:56
  • Thanks. If this array is named b, I want to get SANDY by b[5,4] and get TD by b[9,3]. – Ron Jan 22 '14 at 11:16

3 Answers3

2

try something like this:

data = []
with open("filename") as f:
  for line in f:
    data.append(line.split(","))

and that'll give you a 2D array of the data you can operate on.

if you want to transpose it, you can't just use regular zip, you need to use itertools.izip_longest, as mentioned here.

so you then transpose it like:

data = list(itertools.izip_longest(*data))
Community
  • 1
  • 1
will
  • 10,260
  • 6
  • 46
  • 69
  • Thanks. But I may need a bit more help here if possible. I only got a list called data. Is there a way I can get the 10-by-8 2Darray I want, where for example the element at [0,0] gives me AL192012, [0,4] gives TONY, and [9,3] gives TD? – Ron Jan 22 '14 at 12:04
  • @user3223064 it is a 2D array, you access the elements like array[0][4] in python. If you want to access it like that, then you'll want to use `numpy`, and if you're going to do that, you might as well go the full distance and just use [`numpy.loadtxt()`](http://stackoverflow.com/a/4315914/432913) – will Jan 22 '14 at 12:15
  • Thanks. Yours works. But on the other hand numpy.loadtxt() still only gives me three columns instead of eight. anyway.. – Ron Jan 22 '14 at 12:43
1
>>> with open(filename) as f:
        data = [[cell.strip() for cell in row.rstrip(',').split(',')] for row in f]

>>> for row in data:
        print(row)

['AL192012', 'TONY', '20']
['20121021', '1800', '', 'LO', '20.1N', '50.8W', '25', '1011']
['20121022', '0000', '', 'LO', '20.4N', '51.2W', '25', '1011']
['20121022', '0600', '', 'LO', '20.8N', '51.5W', '25', '1010']
['20121022', '1200', '', 'LO', '21.3N', '51.7W', '30', '1009']
['AL182012', 'SANDY', '45']
['20121021', '1800', '', 'LO', '14.3N', '77.4W', '25', '1006']
['20121022', '0000', '', 'LO', '13.9N', '77.8W', '25', '1005']
['20121022', '0600', '', 'LO', '13.5N', '78.2W', '25', '1003']
['20121022', '1200', '', 'TD', '13.1N', '78.6W', '30', '1002']

If you want to fix the indexes for the short lines, you could explicitely do that afterwards:

>>> data = [row if len(row) == 8 else row[0:1] + [''] * 3 + row[1:3] + [''] * 2 for row in data]
>>> for row in data:
        print(row)

['AL192012', '', '', '', 'TONY', '20', '', '']
['20121021', '1800', '', 'LO', '20.1N', '50.8W', '25', '1011']
['20121022', '0000', '', 'LO', '20.4N', '51.2W', '25', '1011']
['20121022', '0600', '', 'LO', '20.8N', '51.5W', '25', '1010']
['20121022', '1200', '', 'LO', '21.3N', '51.7W', '30', '1009']
['AL182012', '', '', '', 'SANDY', '45', '', '']
['20121021', '1800', '', 'LO', '14.3N', '77.4W', '25', '1006']
['20121022', '0000', '', 'LO', '13.9N', '77.8W', '25', '1005']
['20121022', '0600', '', 'LO', '13.5N', '78.2W', '25', '1003']
['20121022', '1200', '', 'TD', '13.1N', '78.6W', '30', '1002']
poke
  • 369,085
  • 72
  • 557
  • 602
  • Thanks. But may I ask if this gives an array? Seems like data[0] gives first row and data[1] gives second row and so on, with type(data) being a list each. Is there an array where element [5,4] gives SANDY? Or am I not getting your idea.. – Ron Jan 22 '14 at 11:41
  • `data` will be a list of lists; so doing `data[5][4]` will give `SANDY` etc. There are no arrays in Python directly, and the `[5,4]` syntax suggests that you are trying to use arrays from NumPy or something. I think you can convert lists to arrays somehow, but i don’t know how that works—but you don’t necessarily need to do that anyway. Using lists is just fine. – poke Jan 22 '14 at 11:58
0

Here a snippet:

#!/usr/bin/python

import sys

with open(sys.argv[1], 'r') as f:
    content = f.readlines()

for w in content:
    print w

    # split and loop again -> w.split(',')

f.readlines() returns an array
w is an array.

elp
  • 8,021
  • 7
  • 61
  • 120
  • Thanks. But what should I do with your last line? Because when I only included your five lines from import till print w, type(content) is a list, and w is only content[66] which is a string. May I ask what do you mean by split and loop again... – Ron Jan 22 '14 at 11:55