0

I want to write some code that takes a list of items and concatenates them (separated by commas) to long strings, where each string is not longer than a predefined length. For example, for this list:

colors = ['blue','pink','yellow']

and a max len of 10 chars, the output of the code will be:

Long String 0: blue,pink

Long String 1: yellow

I created the following code (below), but its pitfall is cases where the total length of the concatenated items is shorter of the max len allowed, or where it creates one or more long strings and the total len of the concatenation of the residual items in the list is shorter than the max len.

What I'm trying to ask is this: in the following code, how would you "stop" the loop when items run out and yet the concatenation is so short that the "else" clause isn't reached?

Many thanks :)

import pyperclip


# Theoretical bug: when a single item is longer than max_length. Will never happen for the intended use of this code.




raw_list = pyperclip.paste()

split_list = raw_list.split()

unique_items_list = list(set(split_list))                                       # notice that set are unordered collections, and the original order is not maintained. Not crucial for the purpose of this code the way it is now, but good remembering. See more: http://stackoverflow.com/a/7961390/2594546


print "There are %d items in the list." % len(split_list)
print "There are %d unique items in the list." % len(unique_items_list)


max_length = 10                                                               # salesforce's filters allow up to 1000 chars, but didn't want to hard code it in the rest of the code, just in case.


list_of_long_strs = []
short_list = []                                                                 # will hold the items that the max_length chars long str.
total_len = 0
items_processed = []        # will be used for sanity checking
for i in unique_items_list:
    if total_len + len(i) + 1 <= max_length:                                    # +1 is for the length of the comma
        short_list.append(i)
        total_len += len(i) + 1
        items_processed.append(i)
    elif total_len + len(i) <= max_length:                                      # if there's no place for another item+comma, it means we're nearing the end of the max_length chars mark. Maybe we can fit just the item without the unneeded comma.
        short_list.append(i)
        total_len += len(i)                                                     # should I end the loop here somehow?
        items_processed.append(i)
    else:
        long_str = ",".join(short_list)
        if long_str[-1] == ",":                                                 # appending the long_str to the list of long strings, while making sure the item can't end with a "," which can affect Salesforce filters.
            list_of_long_strs.append(long_str[:-1])
        else:
            list_of_long_strs.append(long_str)
        del short_list[:]                                                       # in order to empty the list.
        total_len = 0

unique_items_proccessed = list(set(items_processed))
print "Number of items concatenated:", len(unique_items_proccessed)



def sanity_check():
    if len(unique_items_list) == len(unique_items_proccessed):
        print "All items concatenated"
    else:           # the only other option is that len(unique_items_list) > len(unique_items_proccessed)
        print "The following items weren't concatenated:"
        print ",".join(list(set(unique_items_list)-set(unique_items_proccessed)))


sanity_check()



print ",".join(short_list)         # for when the loop doesn't end the way it should since < max_length. NEED TO FIND A BETTER WAY TO HANDLE THAT


for item in list_of_long_strs:
    print "Long String %d:" % list_of_long_strs.index(item)
    print item
    print
Optimesh
  • 2,667
  • 6
  • 22
  • 22
  • 1
    I'm assuming you're not allowed to use `''.join()` – sshashank124 Apr 14 '14 at 09:06
  • 1
    use `break` operator to exit loop – rpc1 Apr 14 '14 at 09:12
  • 1
    The loop will already end when it's processed all `unique_items_list`; that's the point of a `for` loop! Have you written a test for this "bug"? – jonrsharpe Apr 14 '14 at 09:14
  • Do you have an example input list where the situation you describe appears? – Yannis P. Apr 14 '14 at 09:19
  • @sshashank124 what difference does it make if I use "".join() or ",".join() ? I don't think it affects the loop... does it ? – Optimesh Apr 14 '14 at 09:47
  • @jonrsharpe I thought so too, but evidently when it doesn't reach the max len cap it won't concatenate and create the long str. When it never reaches the max len cap, the only reason it prints is b/c I added ==> print ",".join(short_list) , later in the code, not because it reached the end of the loop. When the concatenated strs _are_ created, the residuals won't print even with that. – Optimesh Apr 14 '14 at 09:50
  • @YannisP. Hi, please change the max_length to 10 and use: ab cd ef gh ij kl mn op and in another case just try: ab cd thanks :) – Optimesh Apr 14 '14 at 09:52
  • @rpc1 where would you put it so all items are concatenated ? – Optimesh Apr 14 '14 at 09:54
  • @jonrsharpe p.s. regarding your question: please notice I have a sanity check function. – Optimesh Apr 14 '14 at 09:55
  • 1
    @Optimesh but note that your sanity check only checks whether you have added the right number of things to `short_list` at some point, **not** whether you have concatenated them into `list_of_long_strs`. – jonrsharpe Apr 14 '14 at 10:09
  • @jonrsharpe you are right, good eye. I think that once the loop issue is resolved this won't be an issue anymore, no? – Optimesh Apr 14 '14 at 10:12
  • @Optimesh you can add condition `total_len + len(i) > max_length` then `break` if all items concatinates loop ends by it self – rpc1 Apr 14 '14 at 10:21
  • @rpc1 that won't yield the desired outcome. – Optimesh Apr 14 '14 at 11:42

2 Answers2

0

At the moment, you don't do anything with i in the else case, so miss out items, and don't deal with short_list if it isn't filled by the last item in the loop.

The simplest solution is to restart short_list with i in :

short_list = [i]
total_len = 0

and to check after the for loop whether there is anything left in short_list, and deal with it if so:

if short_list:
    list_of_long_strs.append(",".join(short_list))

You can simplify the if checking:

new_len = total_len + len(i)
if new_len < max_length:
    ...
elif new_len == max_length:
    ...
else:
    ...

get rid of the if/else block starting:

if long_str[-1] == ",":   

(",".join(...) means that never happens)

and neaten the last part of your code using enumerate (and I'd switch to str.format):

for index, item in enumerate(list_of_long_strs):
    print "Long string {0}:".format(index)
    print item

More broadly, here's what I'd do:

def process(unique_items_list, max_length=10):
    """Process the list into comma-separated strings with maximum length."""
    output = []
    working = []
    for item in unique_items_list:
        new_len = sum(map(len, working)) + len(working) + len(item)
                # ^ items                  ^ commas       ^ new item?
        if new_len <= max_length:
            working.append(item)
        else:
            output.append(working)
            working = [item]
    output.append(working)
    return [",".join(sublist) for sublist in output if sublist]

def print_out(str_list):
    """Print out a list of strings with their indices."""
    for index, item in enumerate(str_list):
        print("Long string {0}:".format(index))
        print(item)

Demo:

>>> print_out(process(["ab", "cd", "ef", "gh", "ij", "kl", "mn"]))
Long string 0:
ab,cd,ef
Long string 1:
gh,ij,kl
Long string 2:
mn
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • thanks for the answer but it doesn't work for me... try to set max_length to 10 and run: ab cd ef gh ij kl mn op ==> kl,gh won't be concatenated. also, how would you address it within the loop? – Optimesh Apr 14 '14 at 10:08
  • @Optimesh I have added a more pythonic example implementation – jonrsharpe Apr 14 '14 at 10:27
  • @Optimesh oh, and note that your version doesn't do anything with `i` in the `else` case, so you end up missing out items. – jonrsharpe Apr 14 '14 at 10:30
  • thanks. I will look into it shortly and upvote/select as correct as needed. :) – Optimesh Apr 14 '14 at 15:24
0

OK, the solution to the problem described in my OP is actually quite simple, and consists 2 modifications:

First one - the else clause:

    else:
    long_str = ",".join(short_list)
    list_of_long_strs.append(long_str)
    items_processed.extend(short_list)                                      #for sanity checking
    del short_list[:]                                                       # in order to empty the list.
    short_list.append(i)                                                    # so we won't lose this particular item
    total_len = len(i)

The main issue here was to append i after deleting the short_list, so the item in which the loop went to the else clause wouldn't get lost. Similarly, total_len was set to the len of this item, instead of 0 as before.

As suggested by friendly commenters above, the if-else under else is redundant, so I took it out.

Second part:

residual_items_concatenated = ",".join(short_list)
list_of_long_strs.append(residual_items_concatenated)

This part makes sure that when the short_list doesn't "make it" to the else clause because the total_len < max_length, its items are still concatenated and added as another item to the list of long strings, like it's friends before.

I feel these two small modifications are the best solution to my problem, as it keeps the majority of the code and just changes a couple of rows instead of re-writing from sratch.

Optimesh
  • 2,667
  • 6
  • 22
  • 22