0

So I want to run a program that will read in a file line by line, and will then print out either Valid or Invalid based on what each line contains.

For this example, I am saying that the input file line can contain ABCabc or a space. If the line only contains these things, the word Valid should be printed. If it is just white space, or contains any other characters or letters, it should print out “Invalid”.

This is what I have come up with:

I can’t seem to get it to ever print out “Valid”

Can you tell why? Thanks!

input = sys.argv[1]
input = open(input,"r")
correctInput = ‘ABCabc ‘

line1 = input.readline()

while line1 != "":
    if all(char in correctInput for char in line1):
        print “Valid”
        line2 = input.readline()
    else:
        print “Invalid”
        line2 = input.readline()
    line1 = line2
Tsang
  • 13
  • 3

2 Answers2

0

If you print out the value of line1 before your if else statement, you'll see it has a newline character in it. (The \n character.) This is the character that gets added to the end of each line whenever you hit the enter key on the keyboard, and you need to either discard the newline characters or include them as valid input.

To include it as valid input

Change correctInput = 'ABCabc '

to

correctInput = 'ABCabc \n'.


Or to discard the newline characters change

if all(char in correctInput for char in line1):

to

if all(char in correctInput for char in line1.replace('\n', '')):


Either method will work.

Bytheway, input is a function in Python. Although you're allowed to use it as a variable name, doing so will prevent you from using the input function in your program. Because of this, it is considered bad practice to use any of the built in function names as your variable names.


RegEx Solution

Just for fun, I came up with the following solution which solves your problem using regular expressions.

import re

with open(sys.argv[1]) as fh:
  valid_lines = re.findall('^[ABCabc ]+\n', fh.read())

This finds any valid lines using the pattern '^[ABCabc ]+\n'. What does this regular expression pattern do?

  • The ^ symbol signifies the start of a line
  • Then comes the [ABCabc ]. When brackets are used, only characters inside of those brackets will be allowed.
  • The + after the brackets means that those characters that where in brackets must be found 1 or more times.
  • And lastly we make sure the valid characters we found are followed by a newline character (\n). This ensures we checked the complete line for valid characters.
hostingutilities.com
  • 8,894
  • 3
  • 41
  • 51
  • 1
    Awesome, this fixed it, thank you! Also good to know about input being a function. I am new to python (obviously) so that is good to know. Thanks again. – Tsang Nov 14 '18 at 06:11
0

Its because readline doesn't remove '\n' from end of the line. You could ignore that problem by splitting whole file content in lines and than validate them one by one.

import sys

file_name = sys.argv[1]
file = open(file_name ,"r")
correctInput = 'ABCabc '

lines = file.read().splitlines()

for line1 in lines:
    if all(char in correctInput for char in line1):
        print 'Valid'
    else:
        print 'Invalid'
Filip Młynarski
  • 3,534
  • 1
  • 10
  • 22