Map two lists generated in NLTK to a dictionary

Question

I am working with two lists generated by NLTK's PlaintextCorpusReader and I would like to combine them into a single dictionary.

The keys for the dictionary should be the sentences in the corpus, which I've extracted using PlaintextCorpusReader's .sents(). The values should be the fileids of where each sentence is located in the corpus, which I've extracted using .fileids().

The .fileids() come back as strings, e.g.

['R_v_Cole_2007.txt', 'R_v_Sellick_2005.txt']

The .sents() come back as list(list(str)), e.g.

[[u'1', u'.'], [u'The', u'Registrar', u'has', u'referred', u'to', u'this', u'Court', u'two', u'applications', u'for', u'permission', u'to', u'appeal', u'against', u'conviction', u'to', u'be', u'heard', u'together', u'.'], ...]

I've tried a range of things, mainly from this question on a similar issue, but everything I try results in the following error:

TypeError: unhashable type: 'list'

Where am I going wrong?

The code I'm working with to get the stuff I want for the dictionary is as follows:

corpus_root = '/Users/danielhoadley/Documents/Python/NLTK/text/'
wordlists = PlaintextCorpusReader(corpus_root, '.*')

dictionary = {}

values = wordlists.fileids()
keys = wordlists.sents()

## How do I get the keys and values into a dictionary from here?

You can't use lists as keys for a dictionary, loop through the result from `sents()` and turn them into tuples. — Nick is tired, Mar 12 '17 at 21:35
@NickA thanks, Nick. It may be the lateness of the hour, but I'm struggling to construct this loop. I'm trying: for item in sents(): key = tupple(tupple(item)). Are you able to steer me further in the right direction? — DanielH, Mar 12 '17 at 22:35
You didn't make it very clear what you want mapped to what. Do you want to look up a word to see which file it was in? What if both files have u'this'? Does the first list have N filenames, and the second list has N lists of tokens which appear in that file? Maybe you should post the complete dict you want from a small sample input. (also post code for your attempt to do it) — Kenny Ostrom, Mar 13 '17 at 00:59

Map two lists generated in NLTK to a dictionary

0 Answers0