1

I am struggling to find the error in my flocculation function.

The goal of the function is to take a list and chunk each group of contiguous values into a single value. For example...

[1, 4, 4, 2, 0, 3, 3, 3] => [1, 4, 2, 0, 3]

The function as it stands now is...

def flocculate(array):
    for index1, val1 in enumerate(array):
        if val1 == 0 or not not val1:
            new_array = array[index1+1:]
            for index2, val2 in enumerate(new_array):
                if array[index1] == val2:
                    array[index1 + index2 + 1] = False
                else:
                    break
    return [value for value in array if type(value) is not bool]

However, it doesn't seem to handle zeros very well.

For example, the input shown below gets some of the zeros correct, but misses some others...

[2, 4, 4, 0, 3, 7, 0, 2, 2, 2, 8, 0, 0, 0] => [2, 4, 3, 7, 0, 2, 8, 0]

John Y
  • 14,123
  • 2
  • 48
  • 72
J. Munson
  • 2,275
  • 2
  • 17
  • 22
  • what's the correct output for an input that repeats a group of number later on? e.g. `[1, 4, 4, 2, 0, 3, 3, 3, 1, 1, 4, 2, 0, 0, 8, 3, 3, 0]` – airstrike Jun 16 '17 at 00:13
  • When you get to a resolution, please remember to up-vote useful things and accept your favourite answer (even if you have to write it yourself), so Stack Overflow can properly archive the question. – Prune Jun 16 '17 at 16:06
  • 1
    Yo @Prune you got my upvote dawg! I just don't have enough stackoverflow street cred yet for my votes to be displayed publicly. That's what stackoverflow tells me anyway. Actually every answer has been pretty dope in its own way. Yours because it was simple and elegant. d-gillis for pointing out flaws in my original code. And peter-de-rivaz, at four whole votes, for pointing out what seems to be the canonical python way of floccing some data. Thanks guys, you rock – J. Munson Jun 17 '17 at 07:00

3 Answers3

6

I think you may be looking for itertools.groupby.

This function collects similar items (similarity defined by an optional key function).

For example:

import itertools

def flocculate(A):
    return [k for k,g in itertools.groupby(A)]

print flocculate([2, 4, 4, 0, 3, 7, 0, 2, 2, 2, 8, 0, 0, 0])
print flocculate([1, 4, 4, 2, 0, 3, 3, 3])

prints:

[2, 4, 0, 3, 7, 0, 2, 8, 0]
[1, 4, 2, 0, 3]
Peter de Rivaz
  • 33,126
  • 4
  • 46
  • 75
3

I deleted my original answer; I finally understood "flocculate" in this context. Sorry ... I'm blinded by several years in ceramics.

You're going to too much work, tagging things that do or don't match. SImply build a new list from the original. Add only Items that do not match the previous one.

test_list = [
    [1, 4, 4, 2, 0, 3, 3, 3],
    [2, 4, 4, 0, 3, 7, 0, 2, 2, 2, 8, 0, 0, 0],
    [-122, 4, 14, 0, 3, 7, 0, 2, 2, -2, 8, 0, 0, 0, 9999]
]

def flocculate(array):
#    return list(set(array))
    result = []
    last = None
    for i in array:
        if i != last:
            result.append(i)
            last = i
    return result

for array in test_list:
    print array, "\n    =>", flocculate(array)

Output:

[1, 4, 4, 2, 0, 3, 3, 3] 
    => [1, 4, 2, 0, 3]
[2, 4, 4, 0, 3, 7, 0, 2, 2, 2, 8, 0, 0, 0] 
    => [2, 4, 0, 3, 7, 0, 2, 8, 0]
[-122, 4, 14, 0, 3, 7, 0, 2, 2, -2, 8, 0, 0, 0, 9999] 
    => [-122, 4, 14, 0, 3, 7, 0, 2, -2, 8, 0, 9999]
Prune
  • 76,765
  • 14
  • 60
  • 81
  • Wow, so much more simple than the route I was going. If anyone is aware of a more proper term for this sort of algorithm, please let us all know. Because googling "flocculation algorithm" was very unhelpful! – J. Munson Jun 16 '17 at 06:51
3

Changing your first if-statement to if val1 is not False: fixes the problem. That said, I would highly recommend following Prune's answer instead. The method of comparing each element in the list to the previous element is much simpler. (And it also has the virtue of not mutating the input list.)


The bug in your code is caused by the fact that False == 0 is evaluated as True in Python. This causes two problems in your function. The first is that the code in the if-block will run for every element in your list, even if you have already marked that element as False. This leads to the second problem: any 0-values which follow a False element will be treated as if they are contiguous equal values (since False == 0) that should be discarded. Thus whenever you have a 0 following contiguous equal elements, this 0 will get changed to False and will thus not be in the output list.

As a short illustration, here is what the list looks like at the beginning of each iteration of your function for the input [2, 4, 4, 0] (and where ">" indicates the current index).

Input: [2, 4, 4, 0]
[>2, 4, 4, 0]
[2, >4, 4, 0]
[2, 4, >False, 0]
[2, 4, False, >False]
Output: [2, 4]
D. Gillis
  • 670
  • 5
  • 8