0

Let's say I have a line in a text file output.dat like this

in kB   16829.38785 17132.36275-14415.58515    72.67157   123.80624    17.02385

How can I split this string to 6 float objects, each contains 5 decimal points?

For now I am using the split by default (space).

import numpy as np

for line in open('output.dat'):
    if line.find('in kB  ') != -1:
        stress = -np.array([float(a) for a in line.split()[2:]])

And as expected, this returns an error like this

ValueError: could not convert string to float: '17132.36275-14415.58515'

Edit: I want to make one thing clear, "-" means negative number, not just a connector. So I want to keep that after the split. The whole problem is exactly caused by when there is a negative result, the "-" occupies a space.

4 Answers4

1

Try this, \d+ any digit \d{5,} Matches at least 5 consecutive digits.

import re

txt = "16829.38785 17132.36275-14415.58515    72.67157   123.80624    17.02385"

[float(v) for v in re.findall("(-?\d+.\d{5,})", txt)]

output,

[16829.38785, 17132.36275, -14415.58515, 72.67157, 123.80624, 17.02385]
sushanth
  • 8,275
  • 3
  • 17
  • 28
0

You can either split multiple times:

values = line.split()[2:]
values = [float(x) for xs in values for x in xs.split('-')]
stress = -np.array(values)

Or you can use a regex:

import re

values = re.findall(r"[\d,\.]+", line)
stress = -np.array([float(x) for x in values])

I'm ignoring the '5 decimals' bit, but if you need to ignore any decimals beyond the 5th, you should definitely use a regex:

values = re.findall(r"[\d]+\.\d{5}", line)
Grismar
  • 27,561
  • 4
  • 31
  • 54
0
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"[\-]{0,1}[0-9]*\.[0-9]{5}"

test_str = "16829.38785 17132.36275-14415.58515    72.67157   123.80624    17.02385"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
Match 1 was found at 0-11: 16829.38785
Match 2 was found at 12-23: 17132.36275
Match 3 was found at 23-35: -14415.58515
Match 4 was found at 39-47: 72.67157
Match 5 was found at 50-59: 123.80624
Match 6 was found at 63-71: 17.02385
Kuldeep Singh Sidhu
  • 3,748
  • 2
  • 12
  • 22
0

re.findall will do this.

The exact regular expression used will depend on exactly how you want the string to be interpreted. In the following example, it is not insisting on exactly 5 decimal places; also the - is part of the number (remove the -? if that is not the case).

import re

s = "in kB   16829.38785 17132.36275-14415.58515    72.67157   123.80624    17.02385"

print([float(x) for x in re.findall("-?\d+\.\d+", s)])

gives

[16829.38785, 17132.36275, -14415.58515, 72.67157, 123.80624, 17.02385]

Note that in the output, the value will not be exactly correct to the number of decimal places shown; this is an ordinary feature of floating point numbers.

alani
  • 12,573
  • 2
  • 13
  • 23