This is related to How to append to the end of an empty list?, but I don't have enough reputation yet to comment there, so I posted a new question here.
I need to append terms onto an empty list of lists. I start with:
Talks[eachFilename][TermVectors]=
[['paragraph','1','text'],
['paragraph','2','text'],
['paragraph','3','text']]
I want to end with
Talks[eachFilename][SomeTermsRemoved]=
[['paragraph','text'],
['paragraph','2'],
['paragraph']]
Talks[eachFilename][SomeTermsRemoved]
starts empty. I can't specify that I want:
Talks[eachFilename][SomeTermsRemoved][0][0]='paragraph'
Talks[eachFilename][SomeTermsRemoved][0][1]='text'
Talks[eachFilename][SomeTermsRemoved][1][0]='paragraph'
etc... (IndexError: list index out of range). If I force populate the string and then try to change it, I get a strings are immutable error.
So, how do I specify that I want Talks[eachFilename][SomeTermsRemoved][0]
to be ['paragraph','text']
, and Talks[eachFilename][SomeTermsRemoved][1]
to be ['paragraph','2']
etc?
.append
works, but only generates a single long column, not a set of lists.
To be more specific, I have a number of lists that are initialized inside a dict
Talks = {}
Talks[eachFilename]= {}
Talks[eachFilename]['StartingText']=[]
Talks[eachFilename]['TermVectors']=[]
Talks[eachFilename]['TermVectorsNoStops']=[]
eachFilename
gets populated from a list of text files, e.g.:
Talks[eachFilename]=['filename1','filename2']
StartingText
has several long lines of text (individual paragraphs)
Talks[filename1][StartingText]=['This is paragraph one','paragraph two']
TermVectors are populated by the NLTK package with a list of terms, still grouped in the original paragraphs:
Talks[filename1][TermVectors]=
[['This','is','paragraph','one'],
['paragraph','two']]
I want to further manipulate the TermVectors
, but keep the original paragraph list structure. This creates a list with 1 term per line:
for eachFilename in Talks:
for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
if unicode(term) not in stop_words:
Talks[eachFilename]['TermVectorsNoStops'].append( term )
Result (I lose my paragraph structure):
Talks[filename1][TermVectorsNoStops]=
[['This'],
['is'],
['paragraph'],
['one'],
['paragraph'],
['two']]