0

I have a question about reading in a .txt rile and taking the string from inside to be used later on in the code.

If I have a file called 'file0.txt' and it contains:

file1.txt
file2.txt

The rest of the files either contain more string file names or are empty.

How can I save both of these strings for later use. What I attempted to do was:

infile = open(file, 'r')
line = infile.readline()
line.split('\n')

But that returned the following:

['file1.txt', '']

I understand that readline only reads one line, but I thought that by splitting it by the return key it would also grab the next file string.

I am attempting to simulate a file tree or to show which files are connected together, but as it stands now it is only going through the first file string in each .txt file.

Currently my output is:

File 1 crawled.
File 3 crawled.
Dead end reached.

My hope was that instead of just recursivley crawling the first file it would go through the entire web, but that goes back to my issue of not giving the program the second file name in the first place.

I'm not asking for a specific answer, just a push in the right direction on how to better handle the strings from the files and be able to store both of them instead of 1.

My current code is pretty ugly, but hopefully it gets the idea across, I will just post it for reference to what I'm trying to accomplish.

def crawl(file):

    infile = open(file, 'r')
    line = infile.readline()
    print(line.split('\n'))

    if 'file1.txt' in line:
        print('File 1 crawled.')
        return crawl('file1.txt')

    if 'file2.txt' in line:
        print('File 2 crawled.')
        return crawl('file2.txt')

    if 'file3.txt' in line:
        print('File 3 crawled.')
        return crawl('file3.txt')

    if 'file4.txt' in line:
        print('File 4 crawled.')
        return crawl('file4.txt')

    if 'file5.txt' in line:
        print('File 5 crawled.')
        return crawl('file5.txt')

   #etc...etc...

    else:
        print('Dead end reached.')

Outside the function:

file = 'file0.txt'
crawl(file)
user2909869
  • 115
  • 1
  • 2
  • 10
  • You seem to be looking for [this](http://stackoverflow.com/questions/3925614/how-do-you-read-a-file-into-a-list-in-python). – devnull Feb 25 '14 at 09:08

4 Answers4

1

Using read() or readlines() will help. e.g.

infile = open(file, 'r')
lines = infile.readlines()
print list(lines)

gives

['file1.txt\n', 'file2.txt\n']

or

infile = open(file, 'r')
lines = infile.read()
print list(lines.split('\n'))

gives

['file1.txt', 'file2.txt']
Steve Rossiter
  • 2,624
  • 21
  • 29
  • I posted this above, but I am not getting the same output as you. I am receiving ['file1.txt\n', 'file2.txt\n']. – user2909869 Feb 25 '14 at 09:17
  • that shouldn't happen with the second method. See [this answer](http://stackoverflow.com/questions/12330522/reading-a-file-without-newlines) for how to stop it happening with readlines – Steve Rossiter Feb 25 '14 at 09:26
0

Readline only gets one line from the file so it has a newline at the end. What you want is file.read() which will give you the whole file as a single string. Split that using newline and you should have what you need. Also remember that you need to save the list of lines as a new variable i.e. assign to your line.split('\n') action. You could also just use readlines which will get a list of lines from the file.

stmfunk
  • 663
  • 5
  • 20
0

change readline to readlines. and no need to split(\n), its already a list.

here is a tutorial you should read

WeaselFox
  • 7,220
  • 8
  • 44
  • 75
0

I prepared file0.txt with two files in it, file1.txt, with one file in it, plus file2.txt and file3.txt, which contained no data. Note, this won't extract values already in the list

def get_files(current_file, files=[]):
    # Initialize file list with previous values, or intial value
    new_files = []
    if not files:
        new_files = [current_file]
    else:
        new_files = files
    # Read files not already in list, to the list
    with open(current_file, "r") as f_in:
        for new_file in f_in.read().splitlines():
            if new_file not in new_files:
                new_files.append(new_file.strip())
    # Do we need to recurse?
    cur_file_index = new_files.index(current_file)
    if cur_file_index < len(new_files) - 1:
        next_file = new_files[cur_file_index + 1]
        # Recurse
        get_files(next_file, new_files)
    # We're done
    return new_files
        

initial_file = "file0.txt"
files = get_files(initial_file)
print(files)
Returns: ['file0.txt', 'file1.txt', 'file2.txt', 'file3.txt']

file0.txt

file1.txt
file2.txt

file1.txt

file3.txt

file2.txt and file3.txt were blank

Edits: Added .strip() for safety, and added the contents of the data files so this can be replicated.

Nick
  • 191
  • 1
  • 2
  • 9