1

I want to be able to split a list of items when reaching a capitalized word, for example:

Input:

s = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third']

Output:

['HARRIS', 'second', 'caught']
['JONES', 'third']
['Smith', 'stole', 'third']

Would it be best to approach this problem using s.index('some regex') and then split the list accordingly at those given indices?

rahlf23
  • 8,869
  • 4
  • 24
  • 54
  • Did you try it out yet? Give your `regex` solution a shot and see what happens. If you are having difficulty, post your code here and explain what is not working. Or maybe you already have and are really close? Post the code! :) – idjaw Jul 18 '17 at 00:42
  • Would those output lists be combined into one containing list? What is wrong with simply looping over the items and splitting the lists yourself? That would take about 3 or 4 lines. And is the splitting word to be all caps or just an initial cap or something else? – Rory Daulton Jul 18 '17 at 00:42
  • You might not need regex as you are only needing to check if the *first* letter is capitalised - this can just be a simple check on the char – Jonathan Holland Jul 18 '17 at 00:43
  • Does it have to be only the first character in the word? What happens with "poTato"? – idjaw Jul 18 '17 at 00:44
  • Yes, it does only need to be the first character of the word. I suppose that does simplify things quite a bit. – rahlf23 Jul 18 '17 at 00:49

4 Answers4

2

You can try this:

s = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third']

indices = [i for i, a in enumerate(s) if a[0].isupper()]

indices.append(len(s))

final_list = [s[indices[i]:indices[i+1]] for i in range(len(indices)-1)]

Output:

[['HARRIS', 'second', 'caught'], ['JONES', 'third'], ['Smith', 'stole', 'third']]

Note that this solution only works when the first letter in a certain element is uppercase.

If you want a solution where any letter can be capitalized:

s = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third']

indices = [i for i, a in enumerate(s) if any(b.isupper() for b in a)]

indices.append(len(s))

final_list = [s[indices[i]:indices[i+1]] for i in range(len(indices)-1)]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
1

If your willing to use a third-party library, you can use iteration_utilities.Iterable to easily accomplish this:

>>> from iteration_utilities import Iterable
>>> 
>>> lst = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third']
>>> Iterable(lst).split(str.isupper, keep_after=True).filter(lambda l: l).as_list()
[['HARRIS', 'second', 'caught'], ['JONES', 'third', 'Smith', 'stole', 'third']]
Christian Dean
  • 22,138
  • 7
  • 54
  • 87
0

A straight forward way is to enumerate the list, when founding a Capital, we start a new list, otherwise append.

s = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third', 'H']

def split_by(lst, p):
    lsts = []
    for x in lst:
        if p(x):
            lsts.append([x])
        else:
            lsts[-1].append(x)
    return lsts

print(split_by(s, str.isupper))
delta
  • 3,778
  • 15
  • 22
0
str.istitle("Abc") #True
str.istitle("ABC") #False
str.istitle("ABc") #False

str.isupper("Abc") #False
str.isupper("ABC") #True
str.isupper("ABc") #False

So I think it will help you Checking if first letter of string is in uppercase

a = "Abc"
print(str.isupper(a[0]))

or

a = "Abc"
print(a[0].isupper())
ZRTSIM
  • 75
  • 6