how to split a string by number of decimal points?

Question

Let's say I have a line in a text file output.dat like this

in kB   16829.38785 17132.36275-14415.58515    72.67157   123.80624    17.02385

How can I split this string to 6 float objects, each contains 5 decimal points?

For now I am using the split by default (space).

import numpy as np

for line in open('output.dat'):
    if line.find('in kB  ') != -1:
        stress = -np.array([float(a) for a in line.split()[2:]])

And as expected, this returns an error like this

ValueError: could not convert string to float: '17132.36275-14415.58515'

Edit: I want to make one thing clear, "-" means negative number, not just a connector. So I want to keep that after the split. The whole problem is exactly caused by when there is a negative result, the "-" occupies a space.

Do you always consider `17132.36275-14415.58515` as two floats seperated by `-` for every line? — Sajith Herath, Jun 04 '20 at 05:41
you cannot convert 123-456 to float. try to use regex search and get the string between `.` — Neo Luk, Jun 04 '20 at 05:42
Does this answer your question? [How to extract a floating number from a string](https://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string) — Sajith Herath, Jun 04 '20 at 05:45

sushanth · Answer 1 · 2020-06-04T05:52:58.377

1

Try this, \d+ any digit \d{5,} Matches at least 5 consecutive digits.

import re

txt = "16829.38785 17132.36275-14415.58515    72.67157   123.80624    17.02385"

[float(v) for v in re.findall("(-?\d+.\d{5,})", txt)]

output,

[16829.38785, 17132.36275, -14415.58515, 72.67157, 123.80624, 17.02385]

edited Jun 04 '20 at 05:52

answered Jun 04 '20 at 05:44

sushanth

8,275
3
17
28

Hi. I want to keep "-" because that is a negative number – Jun 04 '20 at 05:49
@cobra1994, see the updated answer. – sushanth Jun 04 '20 at 05:53

score 0 · Answer 2 · answered Jun 04 '20 at 05:44

You can either split multiple times:

values = line.split()[2:]
values = [float(x) for xs in values for x in xs.split('-')]
stress = -np.array(values)

Or you can use a regex:

import re

values = re.findall(r"[\d,\.]+", line)
stress = -np.array([float(x) for x in values])

I'm ignoring the '5 decimals' bit, but if you need to ignore any decimals beyond the 5th, you should definitely use a regex:

values = re.findall(r"[\d]+\.\d{5}", line)

Kuldeep Singh Sidhu · Answer 3 · 2020-06-04T05:52:10.620

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"[\-]{0,1}[0-9]*\.[0-9]{5}"

test_str = "16829.38785 17132.36275-14415.58515    72.67157   123.80624    17.02385"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Match 1 was found at 0-11: 16829.38785
Match 2 was found at 12-23: 17132.36275
Match 3 was found at 23-35: -14415.58515
Match 4 was found at 39-47: 72.67157
Match 5 was found at 50-59: 123.80624
Match 6 was found at 63-71: 17.02385

Thanks for your answer, but "-" means a negative number, not just a connector. So I want to keep that after the split. — , Jun 04 '20 at 05:50

score 0 · Accepted Answer · answered Jun 04 '20 at 05:51

re.findall will do this.

The exact regular expression used will depend on exactly how you want the string to be interpreted. In the following example, it is not insisting on exactly 5 decimal places; also the - is part of the number (remove the -? if that is not the case).

import re

s = "in kB   16829.38785 17132.36275-14415.58515    72.67157   123.80624    17.02385"

print([float(x) for x in re.findall("-?\d+\.\d+", s)])

gives

[16829.38785, 17132.36275, -14415.58515, 72.67157, 123.80624, 17.02385]

Note that in the output, the value will not be exactly correct to the number of decimal places shown; this is an ordinary feature of floating point numbers.

how to split a string by number of decimal points?

4 Answers4