Given an ascii file I'd like a python one-liner to create a list of words in the file.
Let tfile contain the following 2 lines
abc xyz abc mno
tuv xyz qrs abc
There are 8 words in the file and 5 unique words.
If I assign
file='tfile'
the following one-liner will create a set with the 5 unique words in tfile
s=set(open(file).read().split())
where the output is {'abc', 'mno', 'qrs', 'tuv', 'xyz'}
However if I try something similar to get a list of all words in the file, namely
l=list(open(file).read().split(" "))
I get the following
['abc', 'xyz', 'abc', 'mno\ntuv', 'xyz', 'qrs', 'abc\n']
which doesn't quite work because the last word of each line has a newline appended to it.
If I add strip() to the statement as in
l=list(open(file).read().strip().split(" "))
I get the following, which is better, but still contains a newline which is appended to the first word of the next line in the file.
['abc', 'xyz', 'abc', 'mno', '\ntuv', 'xyz', 'qrs', 'abc']
So 2 questions: (1) is there a one-liner which does what I want? and (2) why does the set of unique words work so nicely, without getting any newline characters?