Parse text file from web site to .csv file

Question

I need to parse a .txt file to a .csv file. Data to be parsed looks like the following three lines over and over till the end of the file.

oklahoma-07  (rt66)
1 12345k 9876542, 4234234.5345345 -.000001234 0000.0 14135.4 0 9992
2 12345 101.8464 192.3456 00116622 202.9136 512.3361 12.543645782334

texas-15 (hwy35)
1 12345k 9876542, 4234234.5345345 -.000001234 0000.0 14135.4 0 9992
2 12345 101.8464 192.3456 00116622 202.9136 512.3361 12.543645782334

The delimiter characters above are space.

Also the source files will be from a web site I keep the information on a web site it is a .txt file displayed on the screen at the moment. E.g. looks like "http://www.example.com/listing.txt".

There could be only the first 3 lines or 90 or 144 lines of data, but the data is always in three lines then the next data set. It simply need to parse the file to the end of file.

There are always two key characters:

"1" in the second line, and "2" in the third line of the data set

And the output needs to be parsed as follows:

oklahoma-07,(rt66), 1, 12345k, 9876542, 4234234.5345345, -.000001234, 0000.0, 14135.4, 0, 9992, 2, 12345, 101.8464, 192.3456, 00116622, 202.9136, 512.3361, 12.543645782334  

texas-15, (hwy35), 1, 12345k, 9876542, 4234234.5345345, -.000001234, 0000.0, 14135.4, 0, 9992, 2, 12345, 101.8464, 192.3456, 00116622, 202.9136, 512.3361, 12.543645782334

So I can view it in Excel the delimiter character should be a comma. For simplicity, I used the same numbers for each data set.

Lastly I need to save the file to a filename.csv file at a particular location e.g. C:/documents/stuff/.

I am completely new to Python. I have seen a lot of different code samples and it has me confused.

Are there blank lines between each line, or do they all run together? — selllikesybok, Aug 07 '15 at 03:23

selllikesybok · Answer 1 · 2015-08-07T04:19:25.147

If you're certain that the data will always be in this format, a simple approach would be something like:

comma_sep = []
this_line = []

lines = my_file.readlines()

for i in range(len(lines)):
    if i % 3 = 0:
        comma_sep.append(" ".join(this_line))
        this_line = []
    else:
        this_line.append(lines[i])

for line in comma_sep:
    line.replace(' ',',')

I'm sure there's a cleaner way to do it.

Also, I suggest reading the Python docs for basic information like how to use urllib, and file handling.

Duplexia · Answer 2 · 2015-08-07T05:43:53.193

This is one way to do it, including how to download the txt file and write the csv file. The chunks generator code is from this answer.

import urllib2

inputfile = urllib2.urlopen('http://127.0.0.1:8000/data.txt')
lines = inputfile.readlines()

def chunks(l, n):
  """Yield successive n-sized chunks from l."""
  for i in xrange(0, len(l), n):
    yield l[i:i+n]

out = []
for record in chunks(lines, 4):
  s = ' '.join(record).replace(',','') # Create a string from the record (list) and remove the comma
  out.append(','.join(s.split())) # Creates a comma separated string and removes whitespace

with open('data.csv', 'w') as outfile:
  for record in out:
    outfile.write("%s\n" % record)

Added the code for downloading the txt file instead of opening it from disk (very similar) — Duplexia, Aug 07 '15 at 05:45

Parse text file from web site to .csv file

2 Answers2