I'm a Perl programmer who's attempting to learn Python by taking some work I've done before and converting it over to Python. This is NOT a line-by-line translation. I want to learn the Python Technique to do this type of task.
I'm parsing a Windows INI file. Sections names are in the format:
[<type> <description>]
The <type>
is a single word field and is not case sensitive. The <description>
could be multiple words.
After a section, there are a bunch of parameters and values. These are in the form of:
<parameter> = <value>
Parameters have no blank spaces and can only contain underscores, letters, and numbers (case insensitive). Thus, the first =
is the divider between a parameter and the value. There might be white space separating the parameter and value around the equals sign. There might be extra white space at the beginning or end of the line.
In Perl, I used regular expressions for parsing:
while (my $line = <CONTROL_FILE>) {
chomp($line);
next if ($line =~ /^\s*[#;']/); #Comments start with "#", ";", or "'"
next if ($line =~ /^\s*$/); #Ignore blank lines
if ($line =~ /^\s*\[\s*(\w+)\s+(.*)/) { #Section
say "This is a '$1' section called '$2'";
}
elsif ($line =~ /^\s*(\w+)\s*=\s*(.*)/) { #Parameter
say "Parameter is '$1' with a value of '$2'";
}
else { #Not Comment, Section, or Parameter
say "Invalid line";
}
}
The problem is that I've been corrupted by Perl, so I think the easiest way to do something is to use a regular expression. Here's the code I have so far...
for line in file_handle:
line = line.strip
# Comment lines and blank lines
if line.find("#") == 1 \
or line.find(";") == 1 \
or line.whitespace:
continue
# Found a Section Heading
if line.find("[") == 1:
print "I want to use a regular expression here"
print "to split the section up into two pieces"
elif line.find("=") != -1:
print "I want to use a regular expression here"
print "to split the parameter into key and value"
else
print "Invalid Line"
There are several things that irritate me here:
- There are two places where a regular expression just seem to be calling out to be used. What is the Python way of doing this splitting?
- I make sure to strip white space on either side of the string, and rewrite the string. That way, I don't have to do the stripping multiple times. However, I'm rewriting the string which I understand is a very inefficient operation in Python. What is the Python way to handle this issue?
- In the end, my algorithm looks pretty much like my Perl algorithm, and that seems to say that I am letting my Perl thinking get in the way. How should my code be structured in Python?
I've been going through the various on line tutorials, and they've helped me with understanding the syntax, but not much in the way of handling the language itself -- especially someone who tends to think in another language.
My question:
- Should I use regular expressions? Or, is there another and better way to handle this?
- Is my coding logic correct? How should I be thinking about parsing this file?