Group consecutive sublists together if the value is equal and then extract the values which have the attribute = 'R'

Question

Input data:

[[30.0, 'P'], [45.0, 'R'], [50.0, 'D']....]
[[10.0, 'R'], [20.0, 'D'], [60.0, 'R']...]
[[42.4, 'R'], [76.0, 'R'], [52.0, 'D']....]

It is going to be a huge list of lists with a float and a string and I need to group the sublists together based on the string value if it is equal to 'R'. The above lists of lists were generated by converting data frames to lists (just for reference).

So I have to find the float value wherever the attribute is equal to 'R' and then put that value in a sublist. We group data together only when the 'R' value attribute containing sublists are consecutive. If not, they should be their own sublist.

Output data:

The 'R' tag data should be together only if they are next to each other or it should be a separate sublist

[[45.0], [10.0], [60.0], [42.4, 76.0]]

Welcome to SO! Just to clarify, you've posted three example inputs? — rcorty, Feb 03 '20 at 21:45

Joan Puigcerver · Answer 1 · 2020-02-04T14:13:34.010

def group_consecutive( lists, char ) :
    result = []
    # For each list
    for l in lists :
        local_result = []

        # For each element in list
        for n, c in l :
            # Check if char is the same
            if c == char :
                local_result.append(n)
            # Else, if local_result has any  element
            elif local_result :
                result.append( local_result )
                local_result = []

        # FIX: Append last result if not empty
        if local_result :
                result.append( local_result )

    return result


l1 = [[30.0, 'P'], [45.0, 'R'], [50.0, 'D']]
l2 = [[10.0, 'R'], [20.0, 'D'], [60.0, 'R']]
l3 = [[42.4, 'R'], [76.0, 'R'], [52.0, 'D']]

result = group_consecutive( [ l1, l2, l3 ], 'R' )
print( result )

The previous code gives this output:

[[45.0], [10.0], [60.0] [42.4, 76.0]]

This doesn't work if the last element is an `'R'`. Since it only appends new_list on elements that are not R, if the last element has an `'R'` there will be a `new_list` that is never appended. It doesn't even match the output in the example given. — ICW, Feb 04 '20 at 04:31

kederrac · Answer 2 · 2020-02-04T06:51:20.873

0

you can use a for loop:

input_data = [
    [[30.0, 'P'], [45.0, 'R'], [50.0, 'D']],
    [[10.0, 'R'], [20.0, 'D'], [60.0, 'R']],
    [[42.4, 'R'], [76.0, 'R'], [52.0, 'D']]]

final_list = []
new_list = []

for l in [e for e in input_data]:
    if new_list:
        final_list.append(new_list)
        new_list = []
    for value, tag in l:
        if tag == 'R':
            new_list.append(value)
        elif new_list:
            final_list.append(new_list)
            new_list = []

print(final_list)

output:

[[45.0], [10.0], [60.0], [42.4, 76.0]]

edited Feb 04 '20 at 06:51

answered Feb 03 '20 at 21:54

kederrac

16,819
6
32
55

This doesn't work. If the last element is an `'R'`. There was one in the example that your code incorrectly didn't include. Since it only appends new_list on elements that are not R, if the last element has an `'R'` there will be a `new_list` that is never appended. It doesn't even match the output in the example given. – ICW Feb 04 '20 at 04:29
@YungGun check now my answer – kederrac Feb 04 '20 at 06:55

score 0 · Answer 3 · answered Feb 03 '20 at 23:12

If I understand correctly, you want to group every consecutive tuple in the input array that has 'R' as it's second element. The output should then be an array of these groups, as in any group of values with consecutive R's appears as an array in the output, in an array of arrays. This should work in python:

def group(input_array):
    r = []
    i = 0
    while( i < len(input_array) ):
        if(input_array[i][1] == 'R'):
            # Figure out how many consecutive R's we have then add the sublist to the return array

            group_end_index = i + 1
            if(group_end_index >= len(input_array)):
                # We've reached the end and have a new group that is one element long
                r.append([input_array[i][0]])
                break
            while(1):
                if( input_array[group_end_index][1] != 'R' ):
                    break
                group_end_index += 1

            r.append(list(map(lambda x: x[0], input_array[i:group_end_index])))
            # + 1 because we know the element at group_end_index does not have an 'R'
            i = group_end_index + 1
        else:
            # Not an 'R', ignore.
            i += 1
    return r


if __name__ == '__main__':
    print(group([[1, 'R'], [2, 'R'], [4, 'A'], [4, 'R']]))

This seems to be doing what you want for a list of elements where the elements are tuples, aka lists with two elements.

Jongware · Answer 4 · 2020-02-04T10:10:55.440

from itertools import groupby

input_data = [
    [[30.0, 'P'], [45.0, 'R'], [50.0, 'D']],
    [[10.0, 'R'], [20.0, 'D'], [60.0, 'R']],
    [[42.4, 'R'], [76.0, 'R'], [52.0, 'D']]]

print (sum([[list(j) for i,j in
    groupby([item[0] if item[1] == 'R' else None for item in sublist],lambda x:x is not None) if i]
    for sublist in input_data],[]))

Result:

[[45.0], [10.0], [60.0], [42.4, 76.0]]

Derivation

If you think of grouping something, you should take a look at what groupby can do for you. To keep it simple, let's first use only part of your longer list to work it out:

i = input_data[2]
print ([(key,*lst) for key,lst in groupby(i, lambda x: x[1]=='R')])

and show how groupby works for your input:

[(True, [42.4, 'R'], [76.0, 'R']), (False, [52.0, 'D'])]

because the two R values are in one grouped list and the other value is in the other. You are not interested in those False values so don't include them:

print ([list(lst) for key,lst in groupby(i, lambda x: x[1]=='R') if key])

and this will get you

[[[42.4, 'R'], [76.0, 'R']]]

Please, do check the results for the other sub-lists in your sample data as well!

It is easy to not include the group key values True and False, but you still have the 'R' strings as well (which, incidentally, add yet another level of brackets). Now groupby can ultimately only decide whether or not to include an item into a group. So you cannot re-write it to 'return' just the number for R items. (I'll be happily corrected on this, by the way.)

But you are not interested in the values that aren't tagged R anyway; you only need to know there may be some value, and if there is, it's only to split runs of R on. You can safely replace them with None, while keeping the R values:

>>> print ([item[0] if item[1] == 'R' else None for item in i])
[42.4, 76.0, None]

which means that that earlier groupby should not check anymore on the presence of R but on not None:

>>> j = [item[0] if item[1] == 'R' else None for item in i]
>>> print ([list(lst) for key,lst in groupby(j, lambda x: x is not None) if key])
[[42.4, 76.0]]

This is, as requested, a list containing lists of continuous items (only one list here, but each of your other input lines will show a different variation). Hold on, we're nearly there.

This testing was done on a single item in your longer list, and so it's easy to loop over the original as well:

for i in input_data:
   ...

Printing out, for example, can be done with this loop. However, you want a list back again. You can use append, of course, but let's have some fun and add a list comprehension around the current groupby:

print ([
         [list(lst) for key,lst
          in groupby([item[0] if item[1] == 'R' else None for item in i],
          lambda x: x is not None) if key]
       for i in input_data])

Don't be alarmed by its length! It's our earlier groupby but instead of a variable i, it contains the list comprehension itself as its first argument. The outermost layer is new; it's only this standard wrapper

[ original list comprehension for i in input_data]

and it shows

[[[45.0]], [[10.0], [60.0]], [[42.4, 76.0]]]

Where do those extra brackets come from? We started out with single items (we changed the list [45.0, 'R'] into a single item 45.0), grouped them by occurrence, grouped that by sub-list, and the total is a list of those lists. You want the total listing, not a list of lists, so let's add them together by flattening the list. (Flattening lists is a well-researched question and you're free to pick any method, but I like sum best because it kept things in a single line...)

Only using the above result as input:

print (sum([[[45.0]], [[10.0], [60.0]], [[42.4, 76.0]]],[]))

neatly shows that the outer layer of extra brackets have disappeared:

[[45.0], [10.0], [60.0], [42.4, 76.0]]

which is precisely what you were after.

Thank you so much! I tried your method and I got an error when I tried to do item[1] == 'R' saying invalid syntax — m_angelo, Feb 04 '20 at 13:51
@m_angelo: sorry, do you mean the code at the very top of my post does not work? I am 100% sure it does. — Jongware, Feb 04 '20 at 13:54

Group consecutive sublists together if the value is equal and then extract the values which have the attribute = 'R'

4 Answers4

Derivation