0

I would like to split a two different strings in python to the same length of 5. For example,

string1=  '21007250.12000 -18047085.73200      1604.90200        59.10000  21007239.94800'
string2=  '24784864.18300-318969464.50000     -1543.53600        34.48000  24784864.9700'
string1_final = ['21007250.12000','-18047085.73200','1604.90200','59.10000','21007239.94800']
string2_final = ['24784864.18300','-318969464.50000','-1543.53600','34.48000','24784864.9700']

Notice the separation of the white space and separating the two numbers while keeping the minus sign. I've tried using string2.split() and string2.split('-'), but it removes the minus. Any help would be greatly appreciated.

  • First you could split the whole string by ' ' and then iterate over that list and add to another list all items that are not '' and then You could loop over that list and loop over each item char in the list and if - is encountered save the current string. Im using phone rn so that is all I can really help with – Matiiss Apr 13 '21 at 14:31

3 Answers3

1

You can use a similar code to the answer to this question and get this:

import re

string1 = '21007250.12000 -18047085.73200      1604.90200        59.10000  21007239.94800'
string2 = '24784864.18300-318969464.50000     -1543.53600        34.48000  24784864.9700'

def process_string (string):
    string_spaces_added = re.sub('-', ' -', string)
    string_spaces_removed = re.sub(' +', ' ', string_spaces_added)

    return string_spaces_removed.split()

print(process_string(string1))
print(process_string(string2))

Output:

['21007250.12000', '-18047085.73200', '1604.90200', '59.10000', '21007239.94800']
['24784864.18300', '-318969464.50000', '-1543.53600', '34.48000', '24784864.9700']
0

You could try something like this:

string1 = '21007250.12000 -18047085.73200      1604.90200        59.10000  21007239.94800'
string2 = '24784864.18300-318969464.50000     -1543.53600        34.48000  24784864.9700'


def splitter(string_to_split: str) -> list:
    out = []
    for item in string_to_split.split():
        if "-" in item and not item.startswith("-"):
            out.extend(item.replace("-", " -").split())
        else:
            out.append(item)
    return out


for s in [string1, string2]:
    print(splitter(s))

Output:

['21007250.12000', '-18047085.73200', '1604.90200', '59.10000', '21007239.94800']
['24784864.18300', '-318969464.50000', '-1543.53600', '34.48000', '24784864.9700']
baduker
  • 19,152
  • 9
  • 33
  • 56
0

Well, it looks like you want the numbers in the strings, rather than "split on variable delimiters"; ie it's not a string like "123 -abc def ghi", it's always a string of numbers.

So using simple regex to identify: an optional negtive sign, some numbers, an optional decimal place and then decimal digits (assuming it will always have digits after the decimal place, unlike numpy's representation of numbers like 2. == 2.0).

import re

numbers = re.compile(r'(-?\d+(?:\.\d+)?)')

string1 = numbers.findall(string1)
string1 == string1_final
# True
string2 = numbers.findall(string2)
string2 == string2_final
# True

# also works for these:
string3 = '123  21007250.12000    -5000 -67.89 200-300.4-7'
numbers.findall(string3)
# ['123', '21007250.12000', '-5000', '-67.89', '200', '-300.4', '-7']

If you expect and want to avoid non-arabic digits, like roman numerals, fractions or anything marked as numerals in unicode, then change each \d in the regex to [0-9].

Note: this regex doesn't include the possibility for exponents, complex numbers, powers, etc.

aneroid
  • 12,983
  • 3
  • 36
  • 66