Extract positions from a list (Python)

Question

I have an .xyz file of H2S and if I read the file like so:

with open('H2S.xyz','r') as stream:
for line in stream:
    print(line)

I get this:

3

XYZ file of the hydrogen sulphide molecule

S                  0.00000000    0.00000000    0.10224900

H                  0.00000000    0.96805900   -0.81799200

H                  0.00000000   -0.96805900   -0.81799200

The first line gives the number of atoms and the last 3 lines the coordinates of those atoms.

I am supposed to write some code to extract the position of each atom in the molecule, in the form of a list where each element is another list with the atom coordinates.

If I do this:

with open('H2S.xyz','r') as stream:
new=list(stream)
new

I get each line as an element in the list, and if I do this:

with open('H2S.xyz','r') as stream:
new_list=[]
for line in stream:
    new_list=new_list+line.split()
new_list

I get every single element seperately:

['3','XYZ','file','of','the','hydrogen','sulphide','molecule','S',
'0.00000000','0.00000000','0.10224900','H','0.00000000','0.96805900',
'-0.81799200','H','0.00000000','-0.96805900','-0.81799200']

Which I don't want. The list I want looks like this:

[['0.00000000','0.00000000','0.10224900'],
['0.00000000','0.96805900','-0.81799200'],
['0.00000000','-0.96805900','-0.81799200']]

But I'm not sure how to code for this.

Is there always going to be one molecule per file? And if the molecule is made from more than 3 atoms, will the coordinates always start on the 3rd line? — roganjosh, Nov 29 '18 at 17:18
@eyllanesc, your edit seems to have destroyed OP's first code output. — Austin, Nov 29 '18 at 17:24
`XYZ file of the hydrogen sulphide molecule` is the second line of the file, obviously. — Kit., Nov 29 '18 at 17:27
Does the original txt have blank lines in the middle of the records? — Pedro Lobito, Nov 29 '18 at 17:28
Is this the actual format (with the specified amount of whitespaces) of the file? — Vasilis G., Nov 29 '18 at 17:30
@roganjosh I guess so, it was specifically for one question/molecule so it should be sufficient if we assume that. I used your solution and edited it slightly since I am not allowed to use certain functions/methods — Butterfly, Nov 29 '18 at 21:56
@PedroLobito The original does not have any blank lines if that's what you meant, but the space between the S and the first number is like that in the original file too — Butterfly, Nov 29 '18 at 22:00

roganjosh · Accepted Answer · 2018-11-29T17:53:25.493

This function should give you the correct output.

def parse_xyz(file_name):

    output = []
    with open(file_name) as infile:
        data = infile.readlines()
        for row in data[2:]: # Throw away the first few lines
            if row[1:]: # Throw away the first column
                output.append(row[1:].split())
    return output


result = parse_xyz('h2s.xyz')
print(result)

Some notes about what it does:

Firstly I wrapped the code in a function. This is generally preferred because it means you can repeat the process with different files e.g. result = parse_xyz('h2o.xyz')
for row in data[2:]: is list slicing so we don't start capturing any results for the few starting lines.
We repeat the slicing notation in the nested for loop, which is equivalent to throwing away the first character(s) of the lines you want to record.

Pedro Lobito · Answer 2 · 2018-11-29T17:51:12.530

0

I'd do something like:

import re
with open("file.txt", "r") as f: 
    print([re.split(r"\s+", x.strip(), 3) for x in f if len(re.split(r"\s+", x, 3)) == 4])

[['S', '0.00000000', '0.00000000', '0.10224900'], ['H', '0.00000000', '0.96805900', '-0.81799200'], ['H', '0.00000000', '-0.96805900', '-0.81799200']]

edited Nov 29 '18 at 17:51

answered Nov 29 '18 at 17:30

Pedro Lobito

94,083
31
258
268

score 0 · Answer 3 · answered Nov 29 '18 at 22:24

Read all lines of the .xyz file, split element and positions and append the positions to a list.

H2S.xyz

    3
XYZ file of the hydrogen sulphide molecule
    S       0.00000000      0.00000000      0.10224900
    H       0.00000000      0.96805900     -0.81799200
    H       0.00000000     -0.96805900     -0.81799200

The code

with open('H2S.xyz') as data:
    lines=data.readlines()                  # read all lines
    new_list = []
    for atom in lines[2:]:                  # start from third line
        position = atom.split()             # get the values
        new_list.append(position[1:])       # append only the the positions

print(new_list)

Your list

[['0.00000000', '0.00000000', '0.10224900'],
['0.00000000', '0.96805900', '-0.81799200'],
['0.00000000', '-0.96805900', '-0.81799200']]

Extract positions from a list (Python)

3 Answers3