2

Summary

I have used a semi-complex regex to retrieve data from a website. The issue I have is that I have to do some post-processing of the matched dataset.

I have gotten the data processes to probably 95+% of where I want it, however, I am getting this simple error message that I cannot reason about; it's strange.

I can bypass it, but that is besides the point. I am trying to figure out if this is a bug or something I am overlooking fundementally with my tuple-unpacking

Background Info

One thing I have to overcome is that I get 4 matches for every "true match". That means that my data for 1 single item is spread out over 4 matches.

In simple graphical form (slighty oversimplified):

index |  a    b    c    d    e    f    g    h    i    j 
--------------------------------------------------------
   1: | ( ), ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( )
   2: | (█), (█), (█), (█), ( ), ( ), ( ), ( ), ( ), ( )
   3: | ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( ), ( )
   4: | ( ), ( ), ( ), ( ), ( ), ( ), (█), (█), (█), (█)

   5: | ( ), ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( )
   6: | (▒), (▒), (▒), (▒), ( ), ( ), ( ), ( ), ( ), ( )
   7: | ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( ), ( )
   8: | ( ), ( ), ( ), ( ), ( ), ( ), (▒), (▒), (▒), (▒)

   9: | ...
        ...
 615: | ...

I can get all the data, but I want to compact it, like so...

index |  a    b    c    d    e    f    g    h    i    j 
--------------------------------------------------------
   1: | (█), (█), (█), (█), (█), (█), (█), (█), (█), (█)
   2: | (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒)

   3: | ...
        ...
 154: | ...

Code

Works

Take note of the varibles abcd, e, f, and ghij and how I have to unpack them in the for-loop at the bottom

matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]

f = [
    f
    for index, (_, _, _, _, _, f, *_)
    in enumerate(matches)
    if index % 4 == 0
]
abcd = [
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
]
e = [
    e
    for index, (_, _, _, _, e, *_)
    in enumerate(matches)
    if index % 4 == 2
]
ghij = [
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3
]

abcdefghij = zip(abcd, e, f, ghij)

for (a, b, c, d), e, f, (g, h, i, j) in abcdefghij:
    print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)

#

Fails

Take note that I am trying to unpack the same tuples right away with the varibles a, b, c, d, e, f, g, h, i, and j

matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]

f = [
    f
    if f == "stable" else "preview"
    for index, (_, _, _, _, _, f, *_)
    in enumerate(matches)
    if index % 4 == 0
]
a, b, c, d = [
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
]
e = [
    e
    for index, (_, _, _, _, e, *_)
    in enumerate(matches)
    if index % 4 == 2
]
g, h, i, j = [
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3]

abcdefghij = zip(a, b, c, d, e, f, g, h, i, j)

for a, b, c, d, e, f, g, h, i, j in abcdefghij:
    print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)

#

With this code, I get the following error message...

... a, b, c, d = [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]`
ValueError: too many values to unpack (expected 4)`

Expectations

I would have expected these two methods to do the exact same logic and the end results should be exactly the same.

They are not! Why?

Christopher Rucinski
  • 4,737
  • 2
  • 27
  • 58
  • Something similar happens with lambdas: https://stackoverflow.com/questions/21892989/what-is-the-good-python3-equivalent-for-auto-tuple-unpacking-in-lambda – gstukelj Oct 15 '19 at 19:14
  • How many items does the list comprehension produce? – wwii Oct 15 '19 at 19:16
  • 2
    I think you are missing a zip. – Paul Panzer Oct 15 '19 at 19:16
  • 1
    ... `zip(*...)` – wwii Oct 15 '19 at 19:17
  • @PaulPanzer line 3 from the bottom has a `zip(...)` – Christopher Rucinski Oct 15 '19 at 19:21
  • 1
    Yes, you need that, as well. but first you probably should do `a, b, c, d = zip(*[(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1])` and presumably the same for `g,h,i,j` – Paul Panzer Oct 15 '19 at 19:24
  • @PaulPanzer That appears to work. I will have to verify that everything lines up correctly. **But why do I need that?** I never thought I would need that given the other one doesn't need the extra `zip` Can you point me to some resources if nothing else?? – Christopher Rucinski Oct 15 '19 at 19:31
  • cannot reproduce with your input data & the comprension you mentionned as failing. Maybe a [mcve] would help? – Jean-François Fabre Oct 15 '19 at 19:34
  • Just go through your list comp and track what it produces: A list of four-tuples `[(a0,b0,c0,d0),(a1,b1,c1,d1),(a2,b2,c2,d2),...]` Before assigning that to `a,b,c,d` you have to regroup. Otherwise `(a0,b0,c0,d0)` would go to `a`, `(a1,b1,c1,d1)` to `b` etc. --- if it worked at all – Paul Panzer Oct 15 '19 at 19:36
  • @Jean-FrançoisFabre I edited to include the data in each sample. Just simply copy-paste-run each example. I have tried to heavily simpliy the example. – Christopher Rucinski Oct 15 '19 at 19:46

3 Answers3

2

@PaulPanzer That appears to work. I will have to verify that everything lines up correctly. But why do I need that?

Say q is an iterable for which (?) your comprehension produces a list with 26 tuples, and each tuple has 4 items.

z = [(a,b,c,d) for i, (a,b,c,d,*e) in enumerate(q)]


In [6]: len(z)
Out[6]: 26

In [7]: len(z[0])
Out[7]: 4

In [17]: z[:3]
Out[17]: [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]

When you try to unpack you are trying to stuff 26 items into four names/variables

In [8]: a,b,c,d = z
Traceback (most recent call last):

  File "<ipython-input-8-64277b78f273>", line 1, in <module>
    a,b,c,d = z

ValueError: too many values to unpack (expected 4)

zip(*list_of_4_item_tuples) will transpose the list_of_4_item_tuples to 4 tuples with 26 items each

In [9]: 

In [9]: a,b,c,d = zip(*z)    # z is the result of the list comprehension shown above

In [11]: len(a),len(b),len(c),len(d)
Out[11]: (26, 26, 26, 26)

Test stuff

import string
a = string.ascii_lowercase
b = string.ascii_lowercase
c = string.ascii_lowercase
d = string.ascii_lowercase
e = string.ascii_lowercase
f = string.ascii_lowercase
q = zip (a,b,c,d,e,f)
wwii
  • 23,232
  • 7
  • 37
  • 77
  • **For clarity:** The fix was `a, b, c, d = zip(*abcd_list_comprehension)`, the same has to be done for `g, h, i, j = zip(*ghij_list_comprehension)`. That is because these create a `list` of `tuples` that need to be combined into a single `list`s. `e` and `f` are already single `list`s and don't need that – Christopher Rucinski Oct 15 '19 at 20:05
  • @ChristopherRucinski - is that what you had in mind or do you think I should make it mimic your code more (i.e. using your variable names)? – wwii Oct 15 '19 at 20:07
0

Your list [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1] doesn't have excatly 4 elements, meaning that trying to unpack it using only four variables fails.

Corentin Pane
  • 4,794
  • 1
  • 12
  • 29
0

Solution

When a list comprehension creates a list of tuples, and you want to unpack those tuples, then you need to do the following with zip(*...)

x, y, z = zip(*list_comprehension)

# To be more clear
x, y, z = zip(*[(i, j, k) for (i, j, k) in tuple_list])
# For my code, this change must be made this code
a, b, c, d = zip(*[
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
])

...

# And this code
g, h, i, j = zip(*[
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3
])

Why

Let's take a look at the following code.

matches = [
    ("a1", "b1", "c1", "d1", "e1"),
    ("a2", "b2", "c2", "d2", "e2"),
    ("a3", "b3", "c3", "d3", "e3"),
    ("a4", "b4", "c4", "d4", "e4"),
    ("a5", "b5", "c5", "d5", "e5")
]

# I want a tuple of a's, b's, and c's
abc = [
    (a, b, c)
    for (a, b, c, *_)  # Ignore elements `d` and `e`
    in matches
]

print("abc =", abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# NOTE: This is a list of tuples of ones, twos, threes, fours, and fives
#       Not a's, b's, and c's!!

# I want a list of e's
e = [
    e
    for (*_, e) 
    in matches
]

print("e =", e)
# e = ['e1', 'e2', 'e3', 'e4', 'e5']
# NOTE: This is a list of e's

The fact that with abc is that I get a list of one's, two's, three's, four's, and five's and not a's, b's and c's.

Deep Dive

The reason for the error message ValueError: too many values to unpack is because you have too many or too few tuples in your list of tuples to unpack.

Remember, you have a list of one's, two's, three's, four's, and five's (5 elements per tuple) and not a's, b's and c's (3 elements per tuple)

So this will always fail

a, b, c = [
    (a, b, c)
    for (a, b, c, *_) 
    in matches
]

# ERROR
#    Traceback (most recent call last):
#      File "...*.py", line 11, in <module>
#        for (a, b, c, *_) in matches
#    ValueError: too many values to unpack (expected 3)

You are trying to put these values [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')] into 3 tuples. You can't! You need 5 tuples inside and outside the list comprehension

But this will succeed. It will be wrong. But it won't cause an error.

# This will assign 5 variables with the tuples (a, b, c) from the original tuples (a, b, c, d, e)
ones, twos, threes, fours, fives = [
    (a, b, c)
    for (a, b, c, *_) in matches
]

print("ones =", ones)
print("twos =", twos)
print("threes =", threes)
print("fours =", fours)
print("fives =", fives)

# Output
# ones = ('a1', 'b1', 'c1')
# twos = ('a2', 'b2', 'c2')
# threes = ('a3', 'b3', 'c3')
# fours = ('a4', 'b4', 'c4')
# fives = ('a5', 'b5', 'c5')

Remeber that we want something like ('a1', 'a2', 'a3', 'a4', 'a5'), not ('a1', 'b1', 'c1')

And if the tuples were of size 20, then you would need to have ...sixs, sevens, .... , nineteens, twenties = [ ... ]

First Try

Well, we want all the 1st elements from each tuple to go together. Same for the 2nd and 3rd. So zip(...) seems like a good candidate. Let's look at the results.

result = list(zip(abc))
print(result)

# list(zip(abc)) = [(('a1', 'b1', 'c1'),), (('a2', 'b2', 'c2'),), (('a3', 'b3', 'c3'),), (('a4', 'b4', 'c4'),), (('a5', 'b5', 'c5'),)]

# Let's look at what one element looks like
print(result[0])
# result[0] = (('a1', 'b1', 'c1'),)

This is wrong!

As you can see, there are a few things one.

  1. Weird tuple structure! Tuples inside of tuples. When you zip a list of tuples. This is the result.
  2. Wrong elements in each tuple! We got a list of ones not a list of a

Second Try

Well, zip doesn't work on a list of tuples (as is). We have to do something to the list of tuples first

Let's look at this...

abc = [(a, b, c) for (a, b, c, *_) in matches]

print(abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# Again, we cannot zip these

print(*abc)
# *abc = ('a1', 'b1', 'c1') ('a2', 'b2', 'c2') ('a3', 'b3', 'c3') ('a4', 'b4', 'c4') ('a5', 'b5', 'c5')
# Wait, here we have a sequence of tuples. Not a list of tuples. Just tuple after tuple after tuple.

# What happens when we zip this "sequence" of tuples?
print(list(zip(*abc)))
# list(zip(*abc)) = [('a1', 'a2', 'a3', 'a4', 'a5'), ('b1', 'b2', 'b3', 'b4', 'b5'), ('c1', 'c2', 'c3', 'c4', 'c5')]

# Great, so let's try this
a, b, c = zip(*abc)

That's what we want!!

Therefore

Since we can do the following.

a, b, c, d = zip(*abcd)

print("a =", a)
print("b =", b)
print("c =", c)

# Output
# a = ('a1', 'a2', 'a3', 'a4', 'a5')
# b = ('b1', 'b2', 'b3', 'b4', 'b5')
# c = ('c1', 'c2', 'c3', 'c4', 'c5')

That means we can do this...

a, b, c, d = zip(*[
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
])
Christopher Rucinski
  • 4,737
  • 2
  • 27
  • 58