3

I have a list in which each item is a sentence. I want to join the items as long as the new combined item does not go over a character limit.

You can join items in a list fairly easily.

x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
' '.join(x)
>>> 'Alice went to the market. She bought an apple. And she then went to the park.'

Now say I would like to sequentially join the items as long as the new combined item is not greater than 50 characters.

The result would be :

['Alice went to the market. She bought an apple.','And she then went to the park.']

You can maybe do a list comprehension like here. Or I can maybe do a conditional iterator like here. But I run into problems where the sentences get cut off.

Clarifications

  • The max character limit refers to the length of a single item in the list...not the length of the entire list. When the list items are combined, no single item in the new list can be over the limit.
  • The items that were not able to be combined are returned in the list as they were unchanged.
  • Combine sentences together as long as they do not exceed limit. If they exceed limit, do not combine and keep as is. Only combine sentences that are sequentially next to each other in the list.
  • Please make sure your solution satisfies the output result as indicated previously above : ['Alice went to the market. She bought an apple.','And she then went to the park.']
Leo E
  • 709
  • 9
  • 16
  • List comprehensions should process your list sequentially, AFAIK. – Dave Liu May 31 '19 at 21:30
  • 1
    The linked question deals with only joining words or characters up to the given length. It seems like you just want a check, like `if len(' '.join(x)) < allowed_length` – G. Anderson May 31 '19 at 21:36
  • Can you show us what you've tried? Also, I'm 100% sure your result is incorrect, or else you haven't specified something. – Dave Liu May 31 '19 at 22:24
  • I think this clarification "The max character limit refers to the length of a single item in the list...not the length of the entire list." disagrees with your example - 'And she then went to the park.' is less than 50 characters ... – Joe P May 31 '19 at 22:41
  • 'And she then went to the park.' is under 50 characters. It could not be combined with the first two. The first two are already combined because under 50 characters. But if you combine with 'And she then went to the park', then that is over the limit ... so it is not combined. In what ways does it disagree with output example? I have expanded to make it clear. – Leo E May 31 '19 at 22:46

6 Answers6

4

List comprehension would probably be a little less legible, since you want to keep checking total length.

A simple function will do. This one accepts empty joined_str or unspecified as default, but can also start with some specified initial str.

def join_50_chars_or_less(lst, limit=50):
    """
    Takes in lst of strings and returns join of strings
    up to `limit` number of chars (no substrings)

    :param lst: (list)
        list of strings to join
    :param limit: (int)
        optional limit on number of chars, default 50
    :return: (list)
        string elements joined up until length of 50 chars.
        No partial-strings of elements allowed.
    """
    for i in range(len(lst)):
        new_join = lst[:i+1]
        if len(' '.join(new_join)) > limit:
            return lst[:i]
    return lst

After defining the function:

>>> x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
>>> join_50_chars_or_less(x)
['Alice went to the market.', 'She bought an apple.']
>>> len('Alice went to the market. She bought an apple.')
47

And let's test against a possibly longer string:

>>> test_str = "Alice went to the market. She bought an apple on Saturday."
>>> len(test_str)
58

>>> test = test_str.split()
>>> test
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on', 'Saturday.']

>>> join_50_chars_or_less(test)
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on']
>>> len(' '.join(join_50_chars_or_less(test)))
>>> 48
Dave Liu
  • 906
  • 1
  • 11
  • 31
  • Thanks Dave for your answer. I still would like a list as result (just with the combined items ... as well as those items that could not be combined because of the limit). But this helps. – Leo E May 31 '19 at 21:50
  • 1
    So instead of appending to a string, just append to a list. Set the default to be an empty list instead of an empty string, and that should work. Technically, you could just specify `joined_str = []`, but clearly `[]` is not a string, so that's not good format. Renaming that parameter would be a good idea. – Dave Liu May 31 '19 at 21:54
  • The intention is to preserve the sentence structure. And not to have a sentence cut off. Sorry if that was not clear. – Leo E May 31 '19 at 21:56
  • 1
    @LeoE I modified the function to return the correct output. – Dave Liu May 31 '19 at 22:03
  • I already wrote two tests demonstrating that the output is correct. – Dave Liu May 31 '19 at 22:18
  • Thanks Dave. The output with your modified function is : `['Alice went to the market.', 'She bought an apple.']` Intended output is : `['Alice went to the market. She bought an apple.','And she then went to the park.']` – Leo E May 31 '19 at 22:20
  • Uhhhh how is that output correct? The length of that total string is 77 chars. Unless you mean "first element + additional sentences", in which case you should have specified that. – Dave Liu May 31 '19 at 22:21
  • If in fact, you want to ALWAYS include the first element of the list, then just remove that one when you pass in a list, since in that case we're completely ignoring the length of the first sentence/element. – Dave Liu May 31 '19 at 22:28
  • The length refers to the max length of a single item in the list ... not the length of all the list. – Leo E May 31 '19 at 22:30
  • Ah, when you said, "new combined item", it seems everyone here interpreted that as "the new list" rather than "the new sentence/element". Your question wasn't clear on this. Do you want to stop completely after you find a len(str)>50, or do you just want to ignore those results? – Dave Liu May 31 '19 at 22:32
  • Sorry if that was not clear. No you do not stop. Combine sentences together as long as they do not exceed limit. If they exceed limit, do not combine and keep as is. Only combine sentences that are sequentially next to each other in the list. – Leo E May 31 '19 at 22:39
  • Thank your for your efforts and proposals. – Leo E May 31 '19 at 23:02
  • @LeoE You know, you can mark an answer as Accepted :) – Dave Liu Oct 18 '22 at 18:17
3

Here's a one-line solution, just because it's possible.

[x[i] for i in range(len(x)) if [sum(list(map(len,x))[:j+1]) for j in range(len(x))][i] < 50]

And here's the same more efficiently - with intermediate results to save recalculation - but still no explicit loops.

lens = list(map(len, x)) 
sums = [sum(lens[:i]) for i in range(len(x))]
[x[i] for i in range(len(x)) if sums < 50]

I doubt this is going to be more efficient than an explicit loop in any realistic case, though!

Joe P
  • 466
  • 3
  • 8
  • Pretty impressive that you were able to do it in one line. But I just want a list to be returned (rather than a string). The new list just has whateever was able to be combined with the character limit. – Leo E May 31 '19 at 22:08
  • 1
    Then it's even (slightly) easier :) - I've removed the `' '.join()` leaving the filtered list. – Joe P May 31 '19 at 22:11
  • It does indeed return a list. But does not return the uncombined item (e.g. 'And she then went to the park.'). The output with your function is this : `['Alice went to the market.', 'She bought an apple.']` The intended output is : `['Alice went to the market. She bought an apple.','And she then went to the park.']` – Leo E May 31 '19 at 22:24
  • I'm not sure I can get that from your question as worded - maybe you could expand it. What output would you want if there were 10 strings, and only the first 2 fit within 50 characters? – Joe P May 31 '19 at 22:25
  • I'm sorry if it isn't clear. And I'm happy to clarify. If there were ten sentences, and only the first two sentences could be combined to be under 50 characters, then a list would be returned with those two sentences combined in an item (as the first item in the list notably) and the other eight items as they were. – Leo E May 31 '19 at 22:28
  • 1
    Seems like this approach will sometimes allow combined strings that will total more than 50 characters because it doesn't count separators. So if you had 3 sentences of 25,15 and 10 characters, the joined string would be 52 characters long (because of the 3 spaces in between). – Alain T. May 31 '19 at 22:32
  • Good point @AlainT. - if this is part of the requirement I would add 1 to the initial `lens` – Joe P May 31 '19 at 22:38
  • Thank your for your efforts and proposals. – Leo E May 31 '19 at 23:02
1

You can use accumulate from itertools to compute the size of the accumulated strings (+separators) and determine the maximum number of items that can be combined.

After than you can decide to combine them and you will also know what items could not fit.

s = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']

from itertools import accumulate
maxCount = sum( size+sep<=50 for sep,size in enumerate(accumulate(map(len,s))) )
combined = " ".join(s[:maxCount])
unused   = s[maxCount:]

print(combined,unused)
# Alice went to the market. She bought an apple. ['And she then went to the park.']                    

You could also obtain maxCount in a more brutal (and inefficient) way, without using accumulate:

maxCount = sum(len(" ".join(s[:n+1]))<=50 for n in range(len(s)))

Or you could do the whole thing in one line:

items = next(s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50 )

# ['Alice went to the market.', 'She bought an apple.']

unused = s[len(items):]

# ['And she then went to the park.']

If you need to perform multiple combinations from the list to produce a new list of combined sentences (as per your latest edit to the question), you can use this in a loop:

combined = []
s        = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
while s:
    items = next((s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50), s[:1])
    combined.append(" ".join(items))
    s = s[len(items):]

print(combined)
# ['Alice went to the market. She bought an apple.', 'And she then went to the park.'] 

EDIT Changed call to the next() function to add a default. This will handle sentences that are already longer than 50 characters.

Alain T.
  • 40,517
  • 4
  • 31
  • 51
  • What's the benefit of itertools here? In what circumstance(s) would this be useful over a list comprehension or a for-loop? – Dave Liu May 31 '19 at 22:11
  • It will perform the size calculation much faster than concatenating strings. – Alain T. May 31 '19 at 22:12
  • How are size calculations optimized? – Dave Liu May 31 '19 at 22:15
  • 1
    No memory manipulation/allocation because enumerate() ,accumulate() and map() are all iterators and len() is O(1). So this ends up being just a big addition, as opposed to string/list concatenations. Admittedly, on a small scale like this, the difference is insignificant. – Alain T. May 31 '19 at 22:21
  • Thank you Alain. Your solution satisfies the intended output of the initial question in a strict sense. But it is unfortunately not applicable generally. As you pointed out, if there are items in the list that are over the character limit, then there are issues. – Leo E May 31 '19 at 23:22
0

A not-so-elegant solution:

result = []
counter = 0
string = ""
for element in x:
    for char in element:
        if len(string) < 50:
            string.append(char)
        else:
            result.append(string)
            string = ""
if len(string) > 0:
    result.append(string)
minterm
  • 259
  • 3
  • 13
0

This is a great question; I can see how there can be useful applications for a solution to this problem.

It doesn't look like the above solutions currently deliver the requested answer, at least in a straightforward and robust way. While I'm sure the below function could be optimised, I believe it solves the problem as requested and is simple to understand.

def wrap_sentences(words,limit=50,delimiter=' '):
    sentences = []
    sentence = ''
    gap = len(delimiter)
    for i,word in enumerate(words):
        if i==0:
            sentence=word
            continue
        # combine word to sentence if under limit
        if len(sentence)+gap+len(word)<=limit:
            sentence=sentence+delimiter+word
        else:
            sentences.append(sentence)
            sentence=word
            # append the final word if not yet appended
            if i == len(words)-1:
               sentences.append(sentence)
               
        # finally, append sentence of all words 
        # if it is below limit and not appended
        if (i == len(words)-1) and (sentences==[]):
            sentences.append(sentence)
    
    return sentences

Using it to get the result:

>>> solution = ['Alice went to the market. She bought an apple.', 'And she then went to the park.']
>>> x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
>>> result = wrap_sentences(x,limit=50,delimiter=' ')
>>> result
['Alice went to the market. She bought an apple.', 'And she then went to the park.']
>>> result==solution
True

The function output evaluates as a match for the poster's desired answer given the same input. Also, if the limit is high and not reached, it still returns the joined sentences.

(edit: some of the terms in my function may seem odd, eg 'words' as the input. Its because I plan to use this function for wrapping Thai words with a no space delimiter across multiple lines; I came across this thread while seeking a simple solution, and decided to apply it to this problem. Hopefully applying this in a general way doesn't detract from the solution!)

Carl Higgs
  • 301
  • 2
  • 7
0

I started from Joe's answer, pulled out the max index with the first_greater_elem method from this answer, and came up with this set of helper methods.

def combine_messages(message_array: List, max_length) -> List:
    lengths = list(map(len, message_array))
    sums = [sum(lengths[:i + 1]) for i in range(len(message_array))]
    max_index = first_greater_elem(sums, max_length)
    if max_index < len(message_array):
        result = [" ".join(message_array[:max_index])]
        result.extend(combine_messages(message_array[max_index:], max_length))
        return result
    return [" ".join(message_array)]


def first_greater_elem(lst, elem):
    for i, item in enumerate(lst):
        if item >= elem:
            return i
    return len(lst)

It recursively continues to combine elements into strings shorter than max_length. So extending your example,

message_array = ['Alice went to the market.', 'She bought an apple.', 'She went to the park.', 'She played.', 'She climbed.', 'She went up the ladder and down the slide.', 'After a while she got tired.', 'So she went home.']

combine_messages(message_array, 50)

['Alice went to the market. She bought an apple.', 'She went to the park. She played. She climbed.', 'She went up the ladder and down the slide.', 'After a while she got tired. So she went home.']
Chuck Wilbur
  • 2,510
  • 3
  • 26
  • 35