3

I have a nested list that I need to chain, then run metrics, then "unchain" back into its original nested format. Here is example data to illustrate:

from itertools import chain

nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
chained_list = list(chain(*nested_list))
print("chained_list: \n", chained_list)
metrics_list = [str(chained_list[x]) +'_score' \
    for x in range(len(chained_list))]
print("metrics_list: \n", metrics_list) 
zipped_scores = list(zip(chained_list, metrics_list))
print("zipped_scores: \n", zipped_scores)

unchain_function = '????'

chained_list: 
 ['x', 'xx', 'xxx', 'yy', 'yyy', 'y', 'yyyy', 'zz', 'z']
metrics_list: 
 ['x_score', 'xx_score', 'xxx_score', 'yy_score', 'yyy_score', 'y_score', 'yyyy_score', 'zz_score', 'z_score']
zipped_scores: 
 [('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'), ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'), ('zz', 'zz_score'), ('z', 'z_score')]

Is there a python function or pythonic way to write an "unchain_function" to get this DESIRED OUTPUT?

[
    [
        ('x', 'x_score'), 
        ('xx', 'xx_score'), 
        ('xxx', 'xxx_score')
    ],
    [
        ('yy', 'yy_score'), 
        ('yyy', 'yyy_score'), 
        ('y', 'y_score'),
        ('yyyy', 'yyyy_score')
    ],
    [
        ('zz', 'zz_score'), 
        ('z', 'z_score')
    ]
]

(background: this is for running metrics on lists having lengths greater than 100,000)

pylang
  • 40,867
  • 14
  • 129
  • 121
  • Does this answer your question? https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks – JL0PD Mar 17 '21 at 04:04
  • How could it possible know how you wanted the list split? If there is some known criteria, then it's easy. – Tim Roberts Mar 17 '21 at 04:05
  • The problem is you are mutating the original list, you need to start looking into the functional programming aspects. You can do this by using map() so that original is not modified, and with some itertools functions you can write sweet generators. You don't have to worry about unchaining as the original would not have been mutated. – the23Effect Mar 17 '21 at 04:13
  • @JL0PD and Tim Roberts:these lists are not in evenly sized chunks, rather, the objective is to preserve the original nesting format. In other words, preserve the original length and order of each nested list in generating the desired output. Also, assume that the itertools.chain step is required for performance purposes as the actual metric_function would take too much time otherwise BTW- thanks for looking at this question – Jarom Feriante Mar 17 '21 at 04:14
  • I will try to write a working code for this in while. – the23Effect Mar 17 '21 at 04:15
  • 4
    Not sure why you are using `chain`. Why not just make the result you want directly with something like: `[[(x, x +'_score') for x in l] for l in nested_list]` – Mark Mar 17 '21 at 04:17
  • 2
    @MarkM Exactly what I wanted to say, Why chain them in the first place. – the23Effect Mar 17 '21 at 04:18
  • 1
    p.s if memory use is an issue, that list comprehension could just as easily be a generator expression. – Mark Mar 17 '21 at 04:20
  • It would make sense to chain them if you have some complex logic that uses data from across the inner lists (on a simpler sense, computing x + z but the logic for deciding that x should be chosen to be added with z is complex.). But you are just using individual elements. So any form of iterator usage would be fine in my opinion. My answer will also be similar to what @MarkM suggested. – the23Effect Mar 17 '21 at 04:26
  • 1
    @JL0PD there was a typo in my desired output... it's fixed now showing desired output sizes as 3, 4, 2. As for other comments, please try to imagine that the chained_list and metrics_list steps are necessary as the real-world metrics function must receive a single list as input for performance purposes. – Jarom Feriante Mar 17 '21 at 04:27
  • Okay then I will think from the angle that chaining is necessary. – the23Effect Mar 17 '21 at 04:29

4 Answers4

1

I dunno about how pythonic this is, but this should work. Long story short, we're using a Wrapper class to turn an immutable primitive (which is impossible to change without replacing) into a mutable variable (so we can have multiple references to the same variable, each organized differently).

We create an identical nested list except that each value is a Wrapper of the corresponding value from the original list. Then, we apply the same transformation to unchain the wrapper list. Copy changes from the processed chained list onto the chained wrapper list, and then access those changes from the nested wrapper list and unwrap them.

I think that using an explicit and simple class called Wrapper is easier to understand, but you could do essentially the same thing by using a singleton list to contain the variable instead of an instance of Wrapper.

from itertools import chain

nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
chained_list = list(chain(*nested_list))

metrics_list = [str(chained_list[x]) +'_score' for x in range(len(chained_list))]
zipped_scores = list(zip(chained_list, metrics_list))

# create a simple Wrapper class, so we can essentially have a mutable primitive.
# We can put the Wrapper into two different lists, and modify its value without
# overwriting it.
class Wrapper:
    def __init__(self, value):
        self.value = value

# create a 'duplicate list' of the nested and chained lists, respectively, 
# such that each element of these lists is a Wrapper of the corresponding
# element in the above lists
nested_wrappers = [[Wrapper(elem) for elem in sublist] for sublist in nested_list]
chained_wrappers = list(chain(*nested_wrappers))

# now we have two references to the same MUTABLE Wrapper for each element of 
# the original lists - one nested, and one chained. If we change a property
# of the chained Wrapper, the change will reflect on the corresponding nested
# Wrapper. Copy the changes from the zipped scores onto the chained wrappers
for score, wrapper in zip(zipped_scores, chained_wrappers):
    wrapper.value = score

# then extract the values in the unchained list of the same wrappers, thus
# preserving both the changes and the original nested organization
unchained_list = [[wrapper.value for wrapper in sublist] for sublist in nested_wrappers]

This ends with unchained_list equal to the following:

[[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score')], [('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score')], [('zz', 'zz_score'), ('z', 'z_score')]]
Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • Green Cloak Guy: this solution worked, thank you! – Jarom Feriante Mar 17 '21 at 14:56
  • It seems like the last line alone is solving most of the problem, i.e. recovering the structure of the original nested list. I wrote an answer offering a simplified approach. – Lack Apr 17 '22 at 10:08
0

I think you just want to group your data according to some condition, i.e. the first letter of the first index in each tuple.

Given

Your flattened, zipped data:

data = [
    ('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'),
    ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'),
    ('zz', 'zz_score'), ('z', 'z_score')
]

Code

[list(g) for _, g in itertools.groupby(data, key=lambda x: x[0][0])]

Output

[[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score')],
 [('yy', 'yy_score'),
  ('yyy', 'yyy_score'),
  ('y', 'y_score'),
  ('yyyy', 'yyyy_score')],
 [('zz', 'zz_score'), ('z', 'z_score')]]

See Also

  • This post on how this tool works
pylang
  • 40,867
  • 14
  • 129
  • 121
0

You have made algorithm very complex you can just do it by simple steps shown below:

  • First create a empty nested list of desired size

    formatted_list = [[] for _ in range(3)]

  • Just loop over the list and format accordingly

    for K in range(0,3):

          for i in nested_list[K]:
    
              formatted_list[K].append(i + '_score')
    
          print([formatted_list])
    
venkatesh
  • 162
  • 2
  • 6
0

Here's a simple way to get the desired output.

nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
zipped_scores = 
 [('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'), ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'), ('zz', 'zz_score'), ('z', 'z_score')]


zipped_scores_iter = iter(zipped_scores)
unchained_list = [[next(zipped_scores_iter) for x in sublist] for sublist in nested_list]

Notice: with the following list comprehension, we could replicate nested_list exactly:

[[x for x in sublist] for sublist in nested_list]

We have the structure. All we want to do is swap the original x for the new value:

[[corresponding_value_for(x) for x in sublist] for sublist in nested_list]

I think the accepted answer takes the same approach, but uses a more complicated method of getting the corresponding value.

There's already a one-to-one correspondence between the input (nested_list) and desired values (zipped_scores), given by their order. Therefore, we can replace x with the corresponding element from zipped_scores by pulling the next item from an iterator.

[[next(zipped_scores_iter) for x in sublist] for sublist in nested_list]

By the way, while in this case it doesn't seem like flattening the list is needed to get the desired output, I've encountered a similar problem where flattening and then re-grouping was useful (sending a batch of inputs to an external process). This was my approach.

Lack
  • 1,625
  • 1
  • 17
  • 29