Python how to catalogate in a list of lists

Question

I would like to subdivided my csv file of messages according to their languages. I have created a list of list, with a tuple for every language. But I receive a strange error in output: too many values to unpack. What is my error? How I can resolve it?

csv1 = open('../archiviato.csv', 'r')
tabula=csv.reader(csv1)

lingue = [('en' , []), ('fr' , []), ('id' , []), ('es' , []), ('pt' , []), ('nl', []), ('de',[]), ('ja',[]), ('it',[]), ('ca',[]), ('tr',[]), ('ko',[]), ('en-gb', []), ('zh-cn', []), ('th', []), ('pl', [])]
altre = set()


for line in tabula:
    text=line[5]
    language=line[7]
    for (lan, tweet) in lingue:
        if language == lan:
            lingue.append(text)
            print language
        else:
            if language in altre:
                continue
            else:
                altre.add(language)

a list like: lingue = [('en'), [message_text_1, message_text_2,.... message_text_n], ('fr'), [message_text_1, message_text_2,.... message_text_n] ] — Lupanoide, Oct 14 '15 at 08:34
Do you mean a list of tuples where the first value is the language and the second value the `list` of tweets belonging to that language ? — Anand S Kumar, Oct 14 '15 at 08:36

mhawke · Answer 1 · 2015-10-14T09:03:58.897

The problem is being caused by this line:

lingue.append(text)

which appends the string text to the lingue list, not to the list in the tuple for the particular language. The error is manifested when

for (lan, tweet) in lingue:

is executed on a subsequent iteration because trying to unpack a single string of length > 2 into lan and tweet will fail, e.g.

>>> a, b = ('a', 'b')    # works
>>> a, b = 'ab'          # works
>>> a. b = 'abc'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack

To fix this particular error you should append text to the tweet list instead of the lingue list.

for (lan, tweet) in lingue:
    if language == lan:
        tweet.append(text)   # N.B. tweet, not lingue
        print language

For convenience and efficiency it would be preferable to use a dictionary instead of a list for lingue. Use the language identifier as a key, and a list of strings for the values which would contain each tweet.

csv1 = open('../archiviato.csv', 'r')
tabula = csv.reader(csv1)

lingue = dict([('en' , []), ('fr' , []), ('id' , []), ('es' , []), ('pt' , []), ('nl', []), ('de',[]), ('ja',[]), ('it',[]), ('ca',[]), ('tr',[]), ('ko',[]), ('en-gb', []), ('zh-cn', []), ('th', []), ('pl', [])])
altre = set()

for line in tabula:
    text = line[5]
    language = line[7]
    try:
        lingue[language].append(text)
        print language
    except KeyError:
        altre.add(language)

If you then require lingue as a list, of tuples you can call lingue.items() to do that. Also, sets will automatically eliminate duplicates, so there there no need to check whether the language is already in the set before adding it.... just add it and you're done.

@Lupanoide: no problem. This answer also caters for collection of unexpected languages in `altre` as per your question. I think this requirement has been overlooked in your acceptance of another answer. — mhawke, Oct 14 '15 at 09:02

Anand S Kumar · Accepted Answer · 2015-10-14T09:16:07.083

The issue is coming because of the line -

lingue.append(text)

You are adding the text to the lingue list, which is most probably a string. Hence after once adding this, when the iteration reaches this newly added string, it unpacks the string into its different characters and tries to put it in 2 values, but that may not be possible , unless the string itself only contains 2 characters.

This is what would be causing the too many values to unpack .

A suggestion is that you are using the wrong data structure for lingue, should use a dictionary with the language as the key and the tweets belonging to the language as the values.

You can also use collections.defaultdict here to assist in creation of the lists on the fly . Example -

from collections import defaultdict
lingue = defaultdict(list)
for line in tabula:
    text=line[5]
    language=line[7]
    lingue[language].append(text)

If you really want a list of tuples after this, you can get it simply using lingue.items(). Example -

print(lingue.items())

As indicated in the comments, if you want to collect unexpected languages in another set, you can use a dictionary of valid keys and already initialized lists as values. And then check if language is in the dictionary, if so add to it otherwise add to the invalid language set. Example -

lingue = {'id': [], 'fr': [], 'th': [], 'nl': [], 'ja': [], 'pl': [], 'it': [], 'ca': [], 'de': [], 'ko': [], 'pt': [], 'es': [], 'en-gb': [], 'en': [], 'tr': [], 'zh-cn': []}
altre= set()

for line in tabula:
    text=line[5]
    language=line[7]
    if language in lingue:
        lingue[language].append(text)
        print language
    else:
        altre.add(language)

By itself a `defaulltdict` is not so helpful here because the OP wants to collect other unexpected languages in the `altre` set. A standard dictionary primed with valid keys is better for this. — mhawke, Oct 14 '15 at 08:59
Ok , If that is a requirement, I have added an example for that as well. Sadly that part looks similar to your solution. I couldn't come up with anything simpler/better :-) — Anand S Kumar, Oct 14 '15 at 09:09
Naturally the answers will converge on the correct solution. I have changed mine to use try/except rather than a dict lookup, but not much difference at all. — mhawke, Oct 14 '15 at 09:14

score -2 · Answer 3 · edited May 23 '17 at 12:14

-2

look at 'too many values to unpack', iterating over a dict. key=>string, value=>list it'll solve your error regarding too many values to unpack also https://stackoverflow.com/a/3294899/1489538 solves your problem

edited May 23 '17 at 12:14

Community

1
1

answered Oct 14 '15 at 08:33

Prashant Shukla

742
2
6
19

Python how to catalogate in a list of lists

3 Answers3