0

This seems like it should be the most basic thing to do in python that it should be almost a default option. I have a text file that has lines such as

123, [12, 23, 45, 67]

The second array is variable in length. How do I read this in? For whatever reason I cannot find a single piece of documentation on how to deal with '[' or ']' which one might argue is the single most basic character in python.

np.loadtxt was a bust, apparently this is only for the most simple of file formats

np.genfromtxt was a bust, due to the missing columns. BTW one would like to believe the missing_value functionality could be helpful here. Would be useful to know what, if anything, the missing_value thing actually does (it is not explained clearly in the documentation at all).

I tried the np.fromstring route which gives me

['123', '[12', '23', '45', '67]']

Presumably I could parse this item by item to deal with the '[' and ']' but at this stage I have just made my own python file reader to read in a fairly basic python construct!

As for the desired output, at this stage I would settle for almost anything. The obvious construct would be line by line of the form

[123, [12, 23, 45, 67]]
user3799584
  • 917
  • 1
  • 9
  • 18
  • 2
    *" I cannot find a single piece of documentation on how to deal with '[' or ']' which one might argue is the single most basic character in python."* That might be because it is unusual to include them in that form in a text file. More common is plain CSV (comma separated values, no brackets). The other extreme, with lots of different delimiters for different data types, is [JSON](http://www.json.org/) and its relatives. If you have control over the design of the file format, I recommend either simplifying to plain CSV (with no brackets), or go full JSON. – Warren Weckesser Dec 03 '16 at 01:55

2 Answers2

2

loadtxt and genfromtxt parse a line, starting with a simple split.

In [360]: '123, [12, 23, 45, 67]'.split(',')
Out[360]: ['123', ' [12', ' 23', ' 45', ' 67]']

then they try to convert the individual strings. Some convert easily to ints or floats. The ones with [ and ] don't. Handling those is not trivial.

The csv reader that comes with Python can handle quoted text, e.g.

 `one, "twenty, three", four'

I have not played with it enough to know whether it can treat [] as quotes or not.

Your bracketed text is easier to parse if you use different delimiters inside the brackets, eg

In [371]: l1='123; [12, 23, 45, 67]'.split(';')
In [372]: l1
Out[372]: ['123', ' [12, 23, 45, 67]']
In [373]: l2=l1[1].strip().strip(']').strip('[').split(',')
In [374]: l2
Out[374]: ['12', ' 23', ' 45', ' 67']

As Warren commented, plain CSV is something of an industry standard, and used in many languages. The use of brackets and such has not been standardized. But there are data exchange languages like XML, JSON and yaml, as well as non-text data files (e.g. HD5F).

JSON example:

In [377]: json.loads('[123, [12, 23, 45, 67]]')
Out[377]: [123, [12, 23, 45, 67]]
hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

The default option is eval. It lets you evaluate Python expressions in strings. It's a security hazard though, see e.g. this question. But ast.literal_eval should be okay. For example:

from ast import literal_eval

with open("name of file") as fh:
    data = [literal_eval(line) for line in fh]
Community
  • 1
  • 1
user7138814
  • 1,991
  • 9
  • 11