Reading text files into lists in Python

Question

Instead of defining documentslike this ...

documents = ["the mayor of new york was there", "machine learning can be useful sometimes","new york mayor was present"]

... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.

I have come up with this code:

# read txt documents
os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
    file_content = open(file, "r")
    lines = file_content.read().splitlines()
    for line in lines:
        documents.append(line)

But the documents resulting from the two strategies seem to be in different format. I want the second strategy to produce the same output as the first.

... what is wrong? Please try to be specific with your problem statements. — juanpa.arrivillaga, Mar 25 '17 at 23:38
My point was that instead of writing "the `documents` resulting form the two strategies seem to be in different format" you should instead *show the output* — juanpa.arrivillaga, Mar 25 '17 at 23:45
Also, doing this: `lines = file_content.read().splitlines()` is not necessary. You can iterate directly over the file handler, and it iterates over lines. So just `for line in file_content:` would be sufficient (although you'll get the trailing newlines). Likely, you just want `documents.append(file_content.read())` And you don't have to iterate over the file at all... — juanpa.arrivillaga, Mar 25 '17 at 23:48
Possible duplicate of [combine multiple text files into one text file using python](http://stackoverflow.com/questions/17749058/combine-multiple-text-files-into-one-text-file-using-python) — OneCricketeer, Mar 26 '17 at 00:35

score 1 · Answer 1 · answered Mar 26 '17 at 00:32

1

If I understand your code correctly, this is equivalent and more performant (no reading the entire file into a string, then splitting to a list).

os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
    documents.extend( line for line in open(file) )

Or maybe even one line.

documents = [ line for line in open(file) for file in glob.glob("*.txt") ]

answered Mar 26 '17 at 00:32

OneCricketeer

179,855
19
132
245

2

you need to reverse the order of the "for"s in the list comprehension – C S Mar 26 '17 at 00:37

score 0 · Answer 2 · edited Mar 26 '17 at 00:26

0

Instead of .read().splitlines(), you can use .readlines(). This will place every file's contents into a list.

edited Mar 26 '17 at 00:26

Darkstarone

4,590
8
37
74

answered Mar 25 '17 at 23:41

K.Land_bioinfo

170
1
3
12

I am new to stack overflow, @juanpa.arrivillaga. What I meant was that the contents of the list that .readlines() creates could be further appended to documents, but I see that your most recent comment answered what I was trying to explain. Thank you. – K.Land_bioinfo Mar 26 '17 at 00:00

score 0 · Answer 3 · answered Mar 26 '17 at 00:42

... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.

Translating the requirements directly gives:

with open('somefile1.txt') as f1:
    lines_file1 = f1.readlines()
with open('somefile2.txt') as f2:
    lines_file2 = f2.readlines()
documents = lines_file1[0:1] + lines_file2[1:3]

FWIW, given the kind of work you're doing, the [fileinput module][1] may be helpful.

Hope this get you back in business :-)

Reading text files into lists in Python

3 Answers3