I am working with two lists generated by NLTK's PlaintextCorpusReader and I would like to combine them into a single dictionary.
The keys for the dictionary should be the sentences in the corpus, which I've extracted using PlaintextCorpusReader's .sents()
. The values should be the fileids of where each sentence is located in the corpus, which I've extracted using .fileids()
.
The .fileids()
come back as strings, e.g.
['R_v_Cole_2007.txt', 'R_v_Sellick_2005.txt']
The .sents()
come back as list(list(str))
, e.g.
[[u'1', u'.'], [u'The', u'Registrar', u'has', u'referred', u'to', u'this', u'Court', u'two', u'applications', u'for', u'permission', u'to', u'appeal', u'against', u'conviction', u'to', u'be', u'heard', u'together', u'.'], ...]
I've tried a range of things, mainly from this question on a similar issue, but everything I try results in the following error:
TypeError: unhashable type: 'list'
Where am I going wrong?
The code I'm working with to get the stuff I want for the dictionary is as follows:
corpus_root = '/Users/danielhoadley/Documents/Python/NLTK/text/'
wordlists = PlaintextCorpusReader(corpus_root, '.*')
dictionary = {}
values = wordlists.fileids()
keys = wordlists.sents()
## How do I get the keys and values into a dictionary from here?