ValueError: too many values to unpack (expected 2) , when I try to extract only 2 substrings from a regex pattern

Question

This is the code but the part of the error is where is the extraction of the substrings after validating the regex pattern structure

def name_and_img_identificator(input_text, text):
    input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f]+", r"\1", normalize("NFD", input_text), 0, re.I)
    input_text = normalize( 'NFC', input_text) # -> NFC
    input_text_to_check = input_text.lower() #Convierte a minuscula todo

    
    #regex_patron_01 = r"\s*\¿?(?:dime los|dime las|dime unos|dime unas|dime|di|cuales son los|cuales son las|cuales son|cuales|que animes|que|top)\s*((?:\w+\s*)+)\s*(?:de series anime|de anime series|de animes|de anime|animes|anime)\s*(?:similares al|similares a|similar al|similar a|parecidos al|parecidos a|parecido al|parecido a)\s*(?:la serie de anime|series de anime|la serie anime|la serie|anime|)\s*(llamada|conocida como|cuyo nombre es|la cual se llama|)\s*((?:\w+\s*)+)\s*\??"

    #Regex in english
    regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w + \ s *) +) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w + \ s *) +) \ s * \ ?? "

    m = re.search(regex_patron_01, input_text_to_check, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code

    if m:
        num, anime_name = m.groups()[2]

        num = num.strip()
        anime_name = anime_name.strip()
        print(num)
        print(anime_name)

    return text

input_text_str = input("ingrese: ")
text = ""

print(name_and_img_identificator(input_text_str, text))

It gives me this error, and the truth is I don't know how to structure this regex pattern so that it only extracts those 2 values (substrings) from that input

Traceback (most recent call last):
  File "serie_recommendarion_for_chatbot.py", line 154, in <module>
    print(serie_and_img_identificator(input_text_str, text))
  File "anime_recommendarion_for_chatbot.py", line 142, in name_and_img_identificator
    num, anime_name = m.groups()
ValueError: too many values to unpack (expected 2)

If I put an input like this: 'Dame el top 8 de animes parecidos a Gundam' 'Give me the top 8 anime like Gundam'

I need you to extract:

num = '8'
anime_name = 'Gundam'

How do I have to fix my regex sequence in that case?

This is missing imports: `import re` and `from unicodedata import normalize` — rv.kvetch, Sep 21 '21 at 04:42

Troll · Answer 1 · 2021-09-21T05:08:13.677

2

You can try extracting the first 2 values, maybe you are missing a colon.

num, anime_name = m.groups()[:2]

That might be the case because you are facing the too many values to unpack error.

Use two separate patterns for the number and the name. For simplicity, I only included a few examples.

For the number Test cases

(?<=(which are the|which|top)\s)[0-9]+(?=\s(anime series|anime))

For the name Test cases

(?<=(like|called|which is called)\s)[A-Za-z]+

The rest is your job to implement the patterns in Spanish.

edited Sep 21 '21 at 05:08

answered Sep 21 '21 at 03:58

Troll

1,895
3
15
34

I was trying that but I have num = '8' and anime_name = ''. But I need num = '8' and anime_name = 'Gundam' – Sep 21 '21 at 04:03
@ElectrisikVocal Ok, now the error is solved. Let's have a look at your regex. – Troll Sep 21 '21 at 04:05
@ElectrisikVocal Sorry, I don't know Spanish very well but I suggest you not to use regular expressions to solve linguistic problems because there are a lot of cases and irregularities in languages. You can use other hints like capital letters, for example, capital G in Gundam. Or a better way is to use NLP, but that is a bit too complicated for your use case. – Troll Sep 21 '21 at 04:16
in this case I only want to extract the strings that would remain in the places where I put in this case I only want to extract the strings that would remain in the places where I put ((?: \ w + \ s *) +) – Sep 21 '21 at 04:19
In the regex pattern you will only see that it can 2 times that sequence which are the 2 places where it is supposed to extract the substrings – Sep 21 '21 at 04:20
@ElectrisikVocal Try using two separate regex patterns for the number and the name. Again, I can't help you on this because I don't know Spanish very well, but [lookahead and lookbehind](https://stackoverflow.com/q/2973436/16936415) are what you need for this. – Troll Sep 21 '21 at 04:32
I have already edited my question and created the regex equivalent in English, it still crashes anyway, maybe you can think of something to fix it. – Sep 21 '21 at 04:46

rv.kvetch · Answer 2 · 2021-09-21T05:11:37.000

Try this out in the Regex playground: Link

So nothing much is changed, the first capture group is still the quantifier for the number of animes, and the 2nd group is the name of the anime itself. I just simplified the regex a bit (got rid of some unnecessary bits for demo purposes). Most of it is unchanged from your version, which was actually pretty solid regex.

Regex: \b(\d+).*(?:called|that are like|known like|whose name is|which is called)\s*((?:\w+\s*)+)\s*\??

Test with your original question - which I translated roughly to English :-)

import re
from unicodedata import normalize


def name_and_img_identificator(input_text, text):
    input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f]+", r"\1",
                        normalize("NFD", input_text), 0, re.I)
    input_text = normalize('NFC', input_text)  # -> NFC
    input_text_to_check = input_text.lower()  # Convierte a minuscula todo


    # Regex in english

    # original
    #   note: you have extra spaces here, which regex might not like.
    #   you can get rid of spaces and then it should hopefully be fine.
    # regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w + \ s *) +) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w + \ s *) +) \ s * \ ?? "

    # simplified
    regex_patron_01 = r'\b(\d+).*(?:called|that are like|known like|whose name is|which is called)\s*((?:\w+\s*)+)\s*\??'

    m = re.search(regex_patron_01, input_text_to_check,
                  re.IGNORECASE)  # Con esto valido la regex haber si entra o no en el bloque de code

    if m:
        num, anime_name = m.groups()[:2]

        num = num.strip()
        anime_name = anime_name.strip()
        print(num)
        print(anime_name)

    return text


#input_text_str = input("ingrese: ")
input_text_str = 'Tell me the top 8 animes that are like Gundam?'
text = ""

print(name_and_img_identificator(input_text_str, text))

no problem! hope a part of it was helpful at least :-) – rv.kvetch Sep 21 '21 at 12:59 — rv.kvetch, Sep 21 '21 at 12:59

Niel Godfrey Pablo Ponciano · Accepted Answer · 2021-09-21T05:27:19.163

Errors in the regex pattern

You forgot to add ?: to not capture this group. Change:

regex_patron_01 = r"...(llamada|conocida como|cuyo nombre es|la cual se llama|)..."

To:

regex_patron_01 = r"...(?:llamada|conocida como|cuyo nombre es|la cual se llama|)..."

To not capture additional spaces or words, your capturing of the num should be non-greedy so that it doesn't catch words like "de"and let the succeeding patterns match it. Change:

regex_patron_01 = r"...((?:\w+\s*)+)..."

To:

regex_patron_01 = r"...((?:\w+?\s*?)+)..."

The .groups() contain already the string matches, thus accessing an index would give you a single string only, which is the root cause of your error. Change:

num, anime_name = m.groups()[2]

To:

num, anime_name = m.groups()

With those changes above, it would be successful:

8
gundam

Improvement

Your regex is too complicated and contains a lot of hard-coded words which would differ by language. My suggestion is to set a standard on the format of the string it can accept to:

Any text here (num) any text here (anime_name)

Which is already the format of your input:

Dame el top 8 de animes parecidos a Gundam

Thus you can remove that long regex and replace with this and the output would be the same:

regex_patron_01 = r"^.*?(\d+).*\s(.+)$"

Note that this requires the (anime_name) to be a single-word. To support multi-words, we have to set a special character that will mark the start of the anime name such as colon :

Dame el top 8 de animes parecidos a: Gundam X

Then the regex would be:

regex_patron_01 = r"^.*?(\d+).*:\s(.+)$"

Output

8
gundam x

ValueError: too many values to unpack (expected 2) , when I try to extract only 2 substrings from a regex pattern

3 Answers3

Errors in the regex pattern

Improvement