-1

Given an ascii file I'd like a python one-liner to create a list of words in the file.

Let tfile contain the following 2 lines

abc xyz abc mno
tuv xyz qrs abc

There are 8 words in the file and 5 unique words.

If I assign

file='tfile'

the following one-liner will create a set with the 5 unique words in tfile

s=set(open(file).read().split())

where the output is {'abc', 'mno', 'qrs', 'tuv', 'xyz'}

However if I try something similar to get a list of all words in the file, namely

l=list(open(file).read().split(" "))

I get the following

['abc', 'xyz', 'abc', 'mno\ntuv', 'xyz', 'qrs', 'abc\n']

which doesn't quite work because the last word of each line has a newline appended to it.

If I add strip() to the statement as in

l=list(open(file).read().strip().split(" "))

I get the following, which is better, but still contains a newline which is appended to the first word of the next line in the file.

['abc', 'xyz', 'abc', 'mno', '\ntuv', 'xyz', 'qrs', 'abc']

So 2 questions: (1) is there a one-liner which does what I want? and (2) why does the set of unique words work so nicely, without getting any newline characters?

2 Answers2

0

You have added a " " as a argument to the split in the second example. At first, you have

s=set(open(file).read().split())

But then, you do

l=list(open(file).read().split(" "))

The key is the split(" "). Without it Python will just split on anything considered whitespace, but with it it is restricted to spaces.

So all you need is

l=list(open(file).read().split())
Luke B
  • 2,075
  • 2
  • 18
  • 26
0

If you want a list of unique words, you can first create a set and then convert to a list.

l=list(set(open(file).read().split()))
Mitchell Olislagers
  • 1,758
  • 1
  • 4
  • 10