Joining an element, beginning with small letter, to a previous element of the list

Question

Stackoverflow, hello

I have a specific task now. It concerns a uniting elements into a list and also checking for a lower letter.

So, I have a hierarchical list with lists inside:

ingridient_names_final=[['Egg', 'Milk', 'Tomato'], ['Duck', 'Water', 'Honey', 'Soy', 'sauce'], ['Potato', 'Garlic', 'Gouda', 'cheese'], ['Beef', 'Sweet', 'pepper', 'Pita', 'bread', 'Wine', 'vinegar', 'Tomato']]

Which should be transformed to:

[['Egg', 'Milk', 'Tomato'], ['Duck', 'Water', 'Honey', 'Soy sauce'], ['Potato', 'Garlic', 'Gouda cheese'], ['Beef', 'Sweet pepper', 'Pita bread', 'Wine vinegar', 'Tomato']]

So, words "sause", "cheese", "pepper", "bread" and "vinegar" I need to join to the previous element of the list.

I understood only that method islower() should be used here:

for element in ingridient_names_final:
    # print (element)
    for element2 in element:
        # print (element2)
        if element2.islower():
            print(element2)

An the result is:

sauce
cheese
pepper
bread
vinegar

But how can I join them to the previous element of the each small list inside the original one? I am a beginner in this language, please, help)

score 2 · Accepted Answer · answered Apr 08 '20 at 14:50

You can do the following, using itertools.groupby:

from itertools import groupby

for lst in ingridient_names_final:
    new_lst = []
    for k, g in groupby(lst, key=lambda s: s[0].islower()):
        if k:
             new_lst[-1] += ' ' + ' '.join(g)
        else:
             new_lst.extend(g)
    lst[:] = new_lst

Or even simpler:

for lst in ingridient_names_final:
    new_lst = []
    for s in lst:
        if s[0].islower():
            new_lst[-1] += ' ' + s
        else:
            new_lst.append(s)
    lst[:] = new_lst

score 0 · Answer 2 · answered Apr 08 '20 at 14:57

regex version solution:

import re

ingredient_names_final = [['Egg', 'Milk', 'Tomato'],
                          ['Duck', 'Water', 'Honey', 'Soy', 'sauce'],
                          ['Potato', 'Garlic', 'Gouda', 'cheese'],
                          ['Beef', 'Sweet', 'pepper', 'Pita', 'bread', 'Wine',
                           'vinegar', 'Tomato']]


print([
    re.findall(r'[A-Z][a-z ]*(?![A-Z])', ' '.join(ingredient))
    for ingredient in ingredient_names_final
])

output:

[['Egg', 'Milk', 'Tomato'], ['Duck', 'Water', 'Honey', 'Soy sauce'], ['Potato', 'Garlic', 'Gouda cheese'], ['Beef', 'Sweet pepper', 'Pita bread', 'Wine vinegar', 'Tomato']]

Or following regex works, too.

print([
    re.split(r'(?<!^)(?=[A-Z])', ' '.join(ingredient))
    for ingredient in ingredient_names_final
])

score 0 · Answer 3 · answered Apr 08 '20 at 15:18

Depending on how many concatenations you want to perform and if you can have a lot of consecutive lowercased words, then you should take care about the fact that strings are immutable in Python.

More info about performance here. So as an alternative to the valid solution proposed above, here is one using str.join.

result = []
for ingredients_list in ingridient_names_final:
    next_idx = 0
    count = 0
    new_ingredients_list = []

    while next_idx < len(ingredients_list) - 1:
        if ingredients_list[next_idx + 1].islower():
            count += 1
            next_idx += 1
            continue
        # Avoid numerous string concatenations
        ingredient = ' '.join(ingredients_list[next_idx - count: next_idx + 1])
        new_ingredients_list.append(ingredient)
        count = 0
        next_idx += 1
    new_ingredients_list.append(' '.join(ingredients_list[next_idx - count: next_idx + 1]))

    result.append(new_ingredients_list)

Joining an element, beginning with small letter, to a previous element of the list

3 Answers3