0

I have a file called list.txt which looks like so:

input1
input2
input3

I am certain there is no blank line after the last line (input3). I then have a python script which will read this file line by line and write the text into some more text to create 3 files (one for each line):

import os
os.chdir("/Users/user/Desktop/Folder")

with open('list.txt','r') as f:
    lines = f.read().split('\n')

    #for l in lines:
        header = "#!/bin/bash \n#BSUB -J %s.sh \n#BSUB -o /scratch/DBC/user/%s.sh.out \n#BSUB -e /scratch/DBC/user/%s.sh.err \n#BSUB -n 1 \n#BSUB -q normal \n#BSUB -P DBCDOBZAK \n#BSUB -W 168:00\n"%(l,l,l)
        script = "cd /scratch/DBC/user\n"
        script2 = 'grep "input" %s > result.%s.txt\n'%(l,l)
        all= "\n".join([header,script,script2])

        with open('script_{}.sh'.format(l), 'w') as output:
            output.write(all)

My problem is, this creates 4 files, not 3: script_input1.sh, script_input.sh, script_input3.sh and script_.sh. This last file has no text where the others would have input1 or input2 or input3.

It seems that Python reads my list.txt line by line, but when it reaches "input3", it somehow continues? How can I tell Python to read my file line by line, separated by "\n" but stop after the last text?

chepner
  • 497,756
  • 71
  • 530
  • 681
mf94
  • 439
  • 4
  • 19
  • Possible duplicate of [Remove the newline character in a list read from a file](https://stackoverflow.com/questions/4319236/remove-the-newline-character-in-a-list-read-from-a-file) – Mort Oct 11 '17 at 14:48
  • 1
    I'll say this [again](https://stackoverflow.com/questions/46685755/python-script-to-make-multiple-bash-scripts#comment80321657_46685755): you probably should rethink your approach. – tripleee Oct 11 '17 at 15:05

5 Answers5

3

First, don't read the whole file into memory when you don't have too - files are iterable so the proper way to read a file line by line is:

with open("/path/to/file.ext") as f:
    for line in f:
        do_something_with(line)

Now in your for loop, you just have to strip the line and, if it's empty, ignore it:

with open("/path/to/file.ext") as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        do_something_with(line)

Slightly unrelated but Python has multiline strings, so you don't need concatenation either:

# not sure I got it right actually ;)
script_tpl = """
#!/bin/bash 
#BSUB -J {line}.sh 
#BSUB -o /scratch/DBC/user/{line}.sh.out 
#BSUB -e /scratch/DBC/user/{line}.sh.err 
#BSUB -n 1 
#BSUB -q normal 
#BSUB -P DBCDOBZAK 
#BSUB -W 168:00
cd /scratch/DBC/user
grep "input" {line} > result.{line}.txt
"""

with open("/path/to/file.ext") as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        script = script_tpl.format(line=line)
        with open('script_{}.sh'.format(line), 'w') as output:
            output.write(script)

As a last note: avoid changing dir in your script, use os.path.join() instead to work with absolute paths.

bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118
  • Thanks @bruno desthuilliers. Question about your last comment: in the following line: "with open('script_{}.sh'.format(l), 'w') as output:", I should replace "l" by "line" right? Because l is no longer defined in this script – mf94 Oct 11 '17 at 15:35
  • And last question, the part that says: "line = line.strip(); if not line:continue": is it saying: strip the line of a blank or newline? And if there is no such blank or newline continue? Sorry, I'm very new to Python so its not quite clear to me – mf94 Oct 11 '17 at 15:47
  • `str.strip()` removes all leading and trailing whitespaces (including newlines) so if the line contains only whitespaces it returns an empty string. And empty strings (as well as empty sequences, dicts and sets, numerical zeros and None) have a false value in a boolean context. So we remove all whitespaces ans if the result is an empty string we continue with tge next one (or fall out of the loop if that was the last line). – bruno desthuilliers Oct 11 '17 at 17:05
  • Thanks so much for this explanation! – mf94 Oct 12 '17 at 09:15
1

Using your current approach, you'll want to:

  • Check if the last element in lines is empty (lines[-1] == '')
  • If so, discard it (lines = lines[:-1]).
with open('list.txt','r') as f:
    lines = f.read().split('\n')

if lines[-1] == '':
    lines = lines[:-1]

for line in lines:    
    print(line)

Don't forget that it's legal for a file to not end in a newline (with a blank line at the end)... this will handle that scenario.


Alternatively, as @setsquare pointed out, you might want to try using readlines():

with open('list.txt', 'r') as f:
    lines = [ line.rstrip('\n') for line in f.readlines() ]

for line in lines:
    print(line)
Attie
  • 6,690
  • 2
  • 24
  • 34
  • What if there are multiple blank lines in the end? – randomir Oct 11 '17 at 14:49
  • If handling blank lines is of concern, then we have a different question... this will just take care of handling the common "_empty last line_" – Attie Oct 11 '17 at 14:49
1

Have you considered using readlines() instead of read()? That will let Python handle the question for you of whether or not the last line has a \n or not.

Bear in mind that if the input file does have a \n on the final line, then using read() and splitting by '\n' will create an extra value. For example:

my_string = 'one\ntwo\nthree\n'
my_list = my_string.split('\n')
print my_list
# >> ['one', 'two', 'three', '']

potential solution

lines = f.readlines()
# remove newlines
lines = [line.strip() for line in lines]
# remove any empty values, just in case
lines = filter(bool, lines)

For a simple example, see here: How do I read a file line-by-line into a list?

setsquare
  • 121
  • 3
  • Why use `readlines()` at all ? `lines = [line.strip() for line in f]` does the same thing. But this won't solve the OP problem - you still need to filter out empty lines. – bruno desthuilliers Oct 11 '17 at 15:02
1

f.read() returns a string that ends with a newline, which split dutifully treats as separating the last line from an empty string. It's not clear why you are reading the entire file into memory explicitly; just iterate over the file object and let it deal with line-splitting.

with open('list.txt','r') as f:
    for l in f:
        # ...
chepner
  • 497,756
  • 71
  • 530
  • 681
0

I think you are using split wrong.

If you have the following:

text = 'xxx yyy'
text.split(' ') # or simply text.split()

The result will be

['xxx', 'yyy']

Now if you have:

text = 'xxx yyy ' # extra space at the end
text.split()

The result will be

['xxx', 'yyy', '']

, because split gets what is before and after each ' ' (space). In this case there is empty string after the last space.

Some functions you might use:

strip([chars]) # This removes all chars at the beggining or end of a string

Example:

text = '___text_about_something___'
text.strip('_')

The result will be:

'text_about_something'

In your particular question, you can simply:

lines = f.readlines() # read all lines of the file without '\n'
for l in lines:
    l.strip(' ') # remove extra spaces at the start or end of line if you need
klaus
  • 1,187
  • 2
  • 9
  • 19