Summary
I have used a semi-complex regex to retrieve data from a website. The issue I have is that I have to do some post-processing of the matched dataset.
I have gotten the data processes to probably 95+% of where I want it, however, I am getting this simple error message that I cannot reason about; it's strange.
I can bypass it, but that is besides the point. I am trying to figure out if this is a bug or something I am overlooking fundementally with my tuple-unpacking
Background Info
One thing I have to overcome is that I get 4 matches for every "true match". That means that my data for 1 single item is spread out over 4 matches.
In simple graphical form (slighty oversimplified):
index | a b c d e f g h i j
--------------------------------------------------------
1: | ( ), ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( )
2: | (█), (█), (█), (█), ( ), ( ), ( ), ( ), ( ), ( )
3: | ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( ), ( )
4: | ( ), ( ), ( ), ( ), ( ), ( ), (█), (█), (█), (█)
5: | ( ), ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( )
6: | (▒), (▒), (▒), (▒), ( ), ( ), ( ), ( ), ( ), ( )
7: | ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( ), ( )
8: | ( ), ( ), ( ), ( ), ( ), ( ), (▒), (▒), (▒), (▒)
9: | ...
...
615: | ...
I can get all the data, but I want to compact it, like so...
index | a b c d e f g h i j
--------------------------------------------------------
1: | (█), (█), (█), (█), (█), (█), (█), (█), (█), (█)
2: | (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒)
3: | ...
...
154: | ...
Code
Works
Take note of the varibles abcd
, e
, f
, and ghij
and how I have to unpack them in the for-loop
at the bottom
matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]
f = [
f
for index, (_, _, _, _, _, f, *_)
in enumerate(matches)
if index % 4 == 0
]
abcd = [
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
]
e = [
e
for index, (_, _, _, _, e, *_)
in enumerate(matches)
if index % 4 == 2
]
ghij = [
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3
]
abcdefghij = zip(abcd, e, f, ghij)
for (a, b, c, d), e, f, (g, h, i, j) in abcdefghij:
print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)
#
Fails
Take note that I am trying to unpack the same tuples right away with the varibles a
, b
, c
, d
, e
, f
, g
, h
, i
, and j
matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]
f = [
f
if f == "stable" else "preview"
for index, (_, _, _, _, _, f, *_)
in enumerate(matches)
if index % 4 == 0
]
a, b, c, d = [
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
]
e = [
e
for index, (_, _, _, _, e, *_)
in enumerate(matches)
if index % 4 == 2
]
g, h, i, j = [
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3]
abcdefghij = zip(a, b, c, d, e, f, g, h, i, j)
for a, b, c, d, e, f, g, h, i, j in abcdefghij:
print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)
#
With this code, I get the following error message...
... a, b, c, d = [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]` ValueError: too many values to unpack (expected 4)`
Expectations
I would have expected these two methods to do the exact same logic and the end results should be exactly the same.
They are not! Why?