So we have a source file in that format.
We want a list of tokens for each line in the file.
The tokens are the result of chopping off everything after the first semicolon, and splitting up the rest on either comma or whitespace. We can do that by replacing commas with spaces, and then just splitting on whitespace.
So we turn to the standard library. The split
method of strings splits on whitespace when you don't give it something to split. The replace
method lets us replace one substring with another (for example, ','
with ' '
). To remove everything after a semicolon, we can partition
it and take the first part (element 0 of the result).* The processing for an individual line thus looks like
line.partition(';')[0].replace(',', ' ').split()
and then we simply do this for each line of the file. To get a list of results of applying some function to elements of a source, we can ask for it directly, using a list comprehension (where basically we describe what the resulting list should look like). A file object in Python is a valid source of lines; you can iterate over it (this concept is probably more familiar to C++ programmers) and the elements are lines of the file.
So all we need to do is open the file (we'll idiomatically use a with
block to manage the file) and produce the list:
with open('asm.s') as source:
parsed = [
line.partition(';')[0].replace(',', ' ').split()
for line in source
]
Done.
* or use split
again, but I find this is less clear when it's not actually your goal to produce a list of elements.