0

I'm trying to read items from a .txt file that has the following:

294.nii.gz [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
296.nii.gz [[10, 13, 62], [40, 1, 64], [34, 0, 49], [27, 0, 49]]
312.nii.gz [[0, 27, 57], [25, 25, 63], [0, 42, 38], [0, 11, 21]]

The way I want to extract the data is:

  1. Get the item name: 294.nii.gz
  2. Item's coordinates serially: [9, 46, 54] [36, 48, 44] ...
  3. Get the next item:

N.B. all the items have the same number of 3D coordinates.

So far I can read the data by following codes:

coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
    content = f.readlines()
content = [x.strip() for x in content]

for item in content:
    print(item.split(' ')[0])

This only prints the item names:

294.nii.gz
296.nii.gz
312.nii.gz

How do I get the rest of the data in the format I need?

banikr
  • 63
  • 1
  • 9
  • you can split into multiple variables, if you don't use [0] – DevLounge Aug 11 '21 at 21:27
  • By printing `item.split(' ')[0]` you are explicitly telling Python that you *don't* want the rest of the line of data. The `str.split` method returns a list, and you are subscripting that list at the `0`th index. If that is intentional then please clarify what you mean by "how do I get the rest of the data". – ddejohn Aug 11 '21 at 21:32
  • Does this answer your question? [How to convert string representation of list to a list?](https://stackoverflow.com/questions/1894269/how-to-convert-string-representation-of-list-to-a-list) – DevLounge Aug 11 '21 at 21:38
  • @DevLounge this question doesn't specifically deal with .txt contents as I have seen. – banikr Aug 11 '21 at 21:52

2 Answers2

2

So you have the fun task of converting a string representation of a list to a list.

To do this, you'll can use the ast library. Specifically, the ast.literal_eval method.

Disclaimer:

According to documentation:

Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.

This is NOT the same as using eval. From the docs:

Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

This can be used for safely evaluating strings containing Python expressions from untrusted sources without the need to parse the values oneself.

You get the first part of the data with item.split(' ')[0].

Then, you'll use item.split(' ')[1:] to get (for example) a string with contents "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]".

If this is a risk you're willing to accept:

A demonstration using ast:

import ast
list_str = "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]"
list_list = ast.literal_eval(list_str)
print(isinstance(list_list, list))
#Outputs True
print(list_list)
#Outputs [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]

Tying it together with your code:

import os 
import ast

coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
    content = f.readlines()
content = [x.strip() for x in content]

for item in content:
    name,coords_str = item.split(' ')[0], item.split(' ')[1:]
    coords = ast.literal_eval(coords_str)
    #name,coords now contain your required data
    #use as needed


Relevant posts:

https://stackoverflow.com/a/10775909/5763413

How to convert string representation of list to a list?

blackbrandt
  • 2,010
  • 1
  • 15
  • 32
  • Hi, with `item.split(' ')[1]` I am not getting the entire coordinates as string. Rather I am getting this: `[[9,` – banikr Aug 11 '21 at 22:11
  • I initially had typed that but changed it to `item.split(' ')[1:]`. This gets everything from the item at index 1 to the end of the list. If I missed one please let me know where as I do not see it. – blackbrandt Aug 12 '21 at 13:55
1

Others have suggested using the dynamic evaluator eval in Python (and even the ast.literal_eval, which definitely works, but there are still ways to perform this kind of parsing without that.

Given that the formatting of the coordinate list in the coor_downsampled.txt file is very json-esque, we can parse it using the very cool json module instead.

NOTE:

There are sources claiming that json.loads is 4x faster than eval, and almost 7x faster than ast.literal_eval, which depending on if you are in the need for speed, I'd recommend using the faster option.

Complete example

import os
import json

coortxt = 'coor_downsampled.txt'
with open(coortxt) as f:
    content = f.readlines()
content = [x.strip() for x in content]

for item in content:
    # split the line just like you did in your own example
    split_line = item.split(" ")

    # the "name" is the first element
    name = split_line[0]

    # here's the tricky part.
    coords = json.loads("".join(split_line[1:]))
    print(name)
    print(coords)

Explanation

Let's break down this tricky line coords = json.loads("".join(split_line[1:]))

split_line[1:] will give you everything past the first space, so something like this:

['[[9,', '46,', '54],', '[36,', '48,', '44],', '[24,', '19,', '46],', '[15,', '0,', '22]]']

But by wrapping it with a "".join(), we can turn it into

'[[9,46,54],[36,48,44],[24,19,46],[15,0,22]]' as a string instead.

Once we have it like that, we simply do json.loads() to get the actual list object

[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]].

dcronqvist
  • 81
  • 3