How to partial split and take the first portion of string in Python?

Question

Have a scenario where I wanted to split a string partially and pick up the 1st portion of the string.

Say String could be like aloha_maui_d0_b0 or new_york_d9_b10. Note: After d its numerical and it could be any size.
I wanted to partially strip any string before _d* i.e. wanted only _d0_b0 or _d9_b10.

Tried below code, but obviously it removes the split term as well.

print(("aloha_maui_d0_b0").split("_d"))
#Output is : ['aloha_maui', '0_b0']
#But Wanted : _d0_b0

Is there any other way to get the partial portion? Do I need to try out in regexp?

score 4 · Answer 1 · answered Aug 31 '20 at 22:08

4

How about just

stArr = "aloha_maui_d0_b0".split("_d")
st2 = '_d' + stArr[1]

This should do the trick if the string always has a '_d' in it

answered Aug 31 '20 at 22:08

Rob T

74
1
3

It doesn't even need to have `_d` in it – if you use a function with one argument being the delimiter with which the string is split, this'll make it reproducible. – monsieuralfonse64 Aug 31 '20 at 22:10
@monsieuralfonse64 You mean something like this ? this also is fine I think.. def ret_str(val): return "_d" + val.split("_d")[1] – Vimo Aug 31 '20 at 22:14
@Vimo I was referring to having the equivalent of "_d" as the function parameter, but the function could have two arguments – `val`, which is the string to split, and `sep`, the delimiter with which to split `val` – monsieuralfonse64 Aug 31 '20 at 22:17

thibsc · Answer 2 · 2020-08-31T22:43:06.087

3

You can use index() to split in 2 parts:

s = 'aloha_maui_d0_b0'
idx = s.index('_d')
l = [s[:idx], s[idx:]]
# l = ['aloha_maui', '_d0_b0']

Edit: You can also use this if you have multiple _d in your string:

s = 'aloha_maui_d0_b0_d1_b1_d2_b2'
idxs = [n for n in range(len(s)) if n == 0 or s.find('_d', n) == n]
parts = [s[i:j] for i,j in zip(idxs, idxs[1:]+[None])]
# parts = ['aloha_maui', '_d0_b0', '_d1_b1', '_d2_b2']

edited Aug 31 '20 at 22:43

answered Aug 31 '20 at 22:11

thibsc

3,747
2
18
38

This works well for strings that split in 2, but for anything larger `index()` will return only the first index of "_d" – monsieuralfonse64 Aug 31 '20 at 22:19

score 2 · Answer 3 · edited Sep 10 '20 at 06:44

I have two suggestions.

partition()

Use the method partition() to get a tuple containing the delimiter as one of the elements and use the + operator to get the String you want:

teste1 = 'aloha_maui_d0_b0'
partitiontest = teste1.partition('_d')
print(partitiontest)
print(partitiontest[1] + partitiontest[2])

Output:

('aloha_maui', '_d', '0_b0')
_d0_b0

The partition() methods returns a tuple with the first element being what is before the delimiter, the second being the delimiter itself and the third being what is after the delimiter.

The method does that to the first case of the delimiter it finds on the String, so you can't use it to split in more than 3 without extra work on the code. For that my second suggestion would be better.

replace()

Use the method replace() to insert an extra character (or characters) right before your delimiter (_d) and use these as the delimiter on the split() method.

teste2 = 'new_york_d9_b10'
replacetest = teste2.replace('_d', '|_d')
print(replacetest)

splitlist = replacetest.split('|')
print(splitlist)

Output:

new_york|_d9_b10
['new_york', '_d9_b10']

Since it replaces all cases of _d on the String for |_d there is no problem on using it to split in more than 2.

Problem?

A situation to which you may need to be careful would be for unwanted splits because of _d being present in more places than anticipated.

Following the apparent logic of your examples with city names and numericals, you might have something like this:

teste3 = 'rio_de_janeiro_d3_b32'
replacetest = teste3.replace('_d', '|_d')
print(replacetest)

splitlist = replacetest.split('|')
print(splitlist)

Output:

rio|_de_janeiro|_d3_b32
['rio', '_de_janeiro', '_d3_b32']

Assuming you always have the numerical on the end of the String and _d won't happen inside the numerical, rpartition() could be a solution:

rpartitiontest = teste3.rpartition('_d')
print(rpartitiontest)
print(rpartitiontest[1] + rpartitiontest[2])

Output:

('rio_de_janeiro', '_d', '3_b32')
_d3_b32

Since rpartition() starts the search on the String's end and only takes the first match to separate the terms into a tuple, you won't have to worry about the first term (city's name?) causing unexpected splits.

score 0 · Answer 4 · answered Sep 01 '20 at 17:27

Use regex's split and keep delimiters capability:

import re

patre = re.compile(r"(_d\d)")
                    #   
#note the surrounding parenthesises - they're what drives "keep"

for line in """aloha_maui_d0_b0 new_york_d9_b10""".split():
    parts = patre.split(line)
    print("\n", line)
    print(parts)

    p1, p2 = parts[0], "".join(parts[1:])
    print(p1, p2)

output:

 aloha_maui_d0_b0
['aloha_maui', '_d0', '_b0']
aloha_maui _d0_b0

 new_york_d9_b10
['new_york', '_d9', '_b10']
new_york _d9_b10

credit due: https://stackoverflow.com/a/15668433

How to partial split and take the first portion of string in Python?

4 Answers4

output: