Lets go through your code:
Consider words.txt
consists of the following text:
hello, I am Solomon
Nice to meet you Solomon
So, you first open this file with fhand = open("words.txt")
, then you initialize a list called words
:
fhand = open("words.txt")
words = list()
Suggestion: Here its advisable to use the with
context manager to open the file. That way, you wouldn't have to close the file explicitly later. If you are just using open()
as above, you'd have to close the file in the end with fhand.close()
.
with open("words.txt", 'r') as fhand:
#<--code--->
In the next line, you iterate over each line in fhand
. Lets print line
which basically shows each line in the text:
for line in fhand:
print(line)
#Output:
hello, I am Solomon
Nice to meet you Solomon
Then you are iterating over line.split()
which splits the above lines of text into individual lists of words. If we print line.split()
:
for line in fhand:
print(line.split())
#Output:
['hello,', 'I', 'am', 'Solomon']
['Nice', 'to', 'meet', 'you', 'Solomon']
Suggestion: You could also make use of splitlines()
to break each line(boundary) into a separate list. This is different from split()
as it does not break each line into words. This method also preserves whitespaces, so you will have to get rid of them with strip(' ')
if your text has any whitespaces in the end or beginning. This method has no side effects and you can still use it:
for line_str in fhand:
print(line_str.strip(' ').splitlines())
#Output:
['hello, I am Solomon']
['Nice to meet you Solomon']
for line in line_str.strip(' ').splitlines(): #watch the indentation
print(line.split())
#Output:
['hello,', 'I', 'am', 'Solomon']
['Nice', 'to', 'meet', 'you', 'Solomon']
In the next piece of code you are iterating over each (word? or rather letter) in line.split()
(as you know we received a list of words with this method before) and then incrementing words
with the set of letters for each word
. So, basically you get a set of letters because you iterated over each word in the lists:
for word in line.split():
words+=word
#Output:
['h', 'e', 'l', 'l', 'o', ',', 'I', 'a', 'm', 'S', 'o', 'l', 'o', 'm', 'o', 'n', 'N', 'i', 'c', 'e', 't', 'o', 'm', 'e', 'e', 't', 'y', 'o', 'u', 'S', 'o', 'l', 'o', 'm', 'o', 'n']
But most likely you are expecting a list of words in a single list words
. We can achieve this with the append()
method as it takes each word
in line.split()
and simply appends(or adds to the end of the list) to words
:
for word in line.split():
words.append(word)
#Output:
['hello,', 'I', 'am', 'Solomon', 'Nice', 'to', 'meet', 'you', 'Solomon']
And then when we look at the other variation words += [word]
:
for word in line.split():
words += [word]
print(words)
#Output:
['hello,', 'I', 'am', 'Solomon', 'Nice', 'to', 'meet', 'you', 'Solomon']
This has the same effect as append()
. Why is that so? Lets print [word]
which is nothing but a list of each word. This is expected because you are taking each word
from line.split()
and then concatenating to words
:
print([word])
#Output:
['hello,']
['I']
['am']
['Solomon']
['Nice']
['to']
['meet']
['you']
['Solomon']
words += [word]
is equivalent to words = words + [word]
. To see how this concatenation works, consider the following example which is equivalent to this statement:
words = list()
word = ["Hello"]
concat_words = words + word
print(concat_words)
#['Hello']
another_word = ["World"]
concat_some_more_words = words + another_word
print(concat_some_more_words)
#['World']
final_concatenation = concat_words + concat_some_more_words
print(final_concatenation)
#Output:
['Hello', 'World']
Lets try append()
on this example:
words1 = list()
words_splitted = ["Hello", "World"]
for word in words_splitted:
words1.append(word)
print(words1)
#['Hello', 'World']
This shows that concatenation is equivalent to appending but it is recommended practice to use append()
for lists:
print(words1==final_concatenation)
#True
Returning back to the original question, let's make the whole code more compact using list comprehensions:
with open("words.txt", 'r') as fhand:
words = [word for line in fhand for word in line.split()]
print(words)
#Output:
['hello,', 'I', 'am', 'Solomon', 'Nice', 'to', 'meet', 'you', 'Solomon']
You will notice I've used the with
context manager to leave file open/close to Python after the job is done(exits the context). Next, I've created a list words
with the same loops inside. This is also called a list comprehension and is one of the most powerful features in Python. This makes the code more compact, easy to read and faster than appending.
Finally, initializing words = []
is much more cleaner than words = list()
. It is also much faster.