5

A have a series of strings in the following format. Demonstration examples would look like this:

71 1 * abwhf

8 askg

*14 snbsb

00ab

I am attempting to write a Python 3 program that will use a for loop to cycle through each string and split it once at the first occurrence of a letter into a list with two elements.

The output for the strings above would become lists with the following elements:

71 1 * and abwhf

8and askg

*14 and snbsb

00 and ab

There is supposed to be a space after the first string of the first three examples but this only shows in the editor

How can I split the string in this way?

Two posts look of relevance here:

The first answer for the first question allows me to split a string at the first occurrence of a single character but not multiple characters (like all the letters of the alphabet).

The second allows me to split at the first letter, but not just one time. Using this would result in an array with many elements.

Community
  • 1
  • 1
LJD200
  • 163
  • 1
  • 1
  • 9

4 Answers4

5

Using re.search:

import re

strs = ["71 1 * abwhf", "8 askg", "*14 snbsb", "00ab"]


def split_on_letter(s):
    match = re.compile("[^\W\d]").search(s)
    return [s[:match.start()], s[match.start():]]


for s in strs:
    print split_on_letter(s)

The regex [^\W\d] matches all alphabetical characters.

\W matches all non-alphanumeric characters and \d matches all numeric characters. ^ at the beginning of the set inverts the selection to match everything that is not (non-alphanumeric or numeric), which corresponds to all letters.

match searches the string to find the index of the first occurrence of the matching expression. You can slice the original string based on the location of the match to get two lists.

C_Z_
  • 7,427
  • 5
  • 44
  • 81
  • This works very well. For Python 3, the brackets need to be added around the `print` function. – LJD200 Feb 24 '16 at 18:41
  • If there are no letters in the string, this raises `AttributeError: 'NoneType' object has no attribute 'start'`. For behavior slightly more consistent with `str.split()` the last line of the function could be `return [s[:match.start()], s[match.start():]] if match else s` – web Jun 12 '23 at 01:04
2

The only way I can think of is to write the function yourself:

import string

def split_letters(old_string):
    index = -1
    for i, char in enumerate(old_string):
        if char in string.letters:
            index = i
            break
    else:
        raise ValueError("No letters found") # or return old_string
    return [old_string[:index], old_string[index:]]
zondo
  • 19,901
  • 8
  • 44
  • 83
  • Thanks for the answer. This is very neat. When running it, I receive an exception: `AttributeError: module 'string' has no attribute 'letters'` – LJD200 Feb 24 '16 at 18:38
  • Sorry; I usually code in Python2. In Python3, it was renamed to `ascii_letters`. If you want something that will work in either one, use `string.lowercase + string.uppercase`. – zondo Feb 24 '16 at 18:39
2

Use re.split()

import re

strings = [
    "71 1 * abwhf",
    "8 askg",
    "*14 snbsb",
    "00ab",
]

for string in strings:
    a, b, c = re.split(r"([a-z])", string, 1, flags=re.I)
    print(repr(a), repr(b + c))

Produces:

'71 1 * ' 'abwhf'
'8 ' 'askg'
'*14 ' 'snbsb'
'00' 'ab'

The trick here is we're splitting on any letter but only asking for a single split. By putting the pattern in parentheses, we save the split character which would normally be lost. We then add the split character back onto the front of the second string.

cdlane
  • 40,441
  • 5
  • 32
  • 81
0
sample1 = '71 1 * abwhf'
sample2 = '8 askg'
sample3 = '*14 snbsb'
sample4 = '00ab'
sample5 = '1234'

def split_at_first_letter(txt):
    for value in txt:
        if value.isalpha():
            result = txt.split(value, 1)
            return [result[0], '{}{}'.format(value, result[1], )]

    return [txt]

print(split_at_first_letter(sample1))
print(split_at_first_letter(sample2))
print(split_at_first_letter(sample3))
print(split_at_first_letter(sample4))
print(split_at_first_letter(sample5))

Result

['71 1 * ', 'abwhf']
['8 ', 'askg']
['*14 ', 'snbsb']
['00', 'ab']
['1234']
Yoriz
  • 3,595
  • 2
  • 13
  • 20
  • Thanks for this answer. How does the code inside the `return` statement work? – LJD200 Feb 24 '16 at 18:54
  • The split in the previous line removes the first character of the 2nd list item, to replace the character, a new list is created from the first element of the split(index 0) and the character that was split plus the second element of the split(index 1) formatted together. – Yoriz Feb 24 '16 at 19:05