0

I've been given a load of data files (.txt) representing the data from an experiment, from one instrument. Here is an example:

141716:test: 1 width: 10distance: 13 time: 1690 x:2036.1222 y:696.022 target:1925-2175
141718:test: 2 width: 10distance: 29 time: 624 x:1646.027 y:814.01953 target:1525-1775
141719:test: 3 width: 10distance: 15 time: 688 x:504.4982 y:846.8401 target:375-375
141721:test: 4 width: 10distance: 22 time: 620 x:696.42004 y:922.6398 target:550-550
141722:test: 5 width: 10distance: 10 time: 709 x:366.33945 y:950.7717 target:250-250
141724:test: 6 width: 10distance: 7 time: 602 x:2181.1575 y:641.32117 target:2075-2325
141725:test: 7 width: 10distance: 8 time: 568 x:2207.414 y:741.3456 target:2050-2300
141726:test: 8 width: 10distance: 28 time: 490 x:1629.773 y:691.3334 target:1550-1800
141727:test: 9 width: 10distance: 23 time: 479 x:1811.6924 y:651.8706 target:1675-1925
141728:test: 10 width: 10distance: 26 time: 491 x:776.4396 y:851.138 target:650-650

As all the other data files are cvs I've transformed these into csv files as per Convert tab-delimited txt file into a csv file using Python. How would I go about turning the above csv files into a format where the first line is the name of each data, and the subsequent lines are the values of the data. I have about a hundred of these, so don't want to do it manually.

Community
  • 1
  • 1
Tom Kealy
  • 2,537
  • 2
  • 27
  • 45
  • This is not a CSV file. We can convert it, but the format is terrible. – Martijn Pieters Dec 09 '12 at 16:24
  • 1
    What is the desired output? – unutbu Dec 09 '12 at 16:40
  • Desired output: timestamp, test, width, distance, time, x, y, target on the top line and then the values on every line below. I have ~100 of these to do, before I can match up this instruments data with data from other sources. – Tom Kealy Dec 09 '12 at 16:47

1 Answers1

2

This is not CSV. The format is terrible. There is no delimiter between the width and the distance fields, for example, and some fields have a space after the : colon and others don't.

You'll have to process this using custom code, then write it out to a CSV file:

import re
import csv

lineformat = re.compile(
    r'^(?P<count>\d+)[\s:]*'
    r'test[\s:]*(?P<test>\d+)[\s:]*'
    r'width[\s:]*(?P<width>\d+)[\s:]*'
    r'distance[\s:]*(?P<distance>\d+)[\s:]*'
    r'time[\s:]*(?P<time>\d+)[\s:]*'
    r'x[\s:]*(?P<x>\d+\.\d+)[\s:]*'
    r'y[\s:]*(?P<y>\d+\.\d+)[\s:]*'
    r'target[\s:]*(?P<target>\d+-\d+)[\s:]*'
)
fields = ('count', 'test', 'width', 'distance', 'time', 'x', 'y', 'target')

with open(inputfile) as finput, open(outputfile) as foutput:
    csvout = csv.DictWriter(foutput, fields=fields)
    for line in finput:
        match = lineformat.search(line)
        if match is not None:
            csvout.writerow(match.groupdict())

This uses a regular expression with named groups to parse out the line into a dictionary, for easy writing out to the CSV file. I've picked 'count' as the name for the first numeric value in your input file, feel free to change it (but do so in both the regular expression and the fields tuple).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks, this is why statisticians should be in the design meetings! – Tom Kealy Dec 09 '12 at 16:50
  • Hi Martijn, I keep on getting a TypeError: '_sre.SRE_Pattern' object is not callable when running that script: are you using pythin 2.7? I'm using 3.3. – Tom Kealy Jan 12 '13 at 17:29
  • @TomKealy: nope, just a small error: the `.search()` call was missing from the answer. :-) – Martijn Pieters Jan 12 '13 at 17:34
  • Tanks! :). Kinda annoyed I didn't spot that now. – Tom Kealy Jan 12 '13 at 17:59
  • Hi Martjin, so sorry to bother you again. The script runs, but produces blank output. I'm only a statistician and have no expertise in manipulating files like this. How would I go about fixing this? – Tom Kealy Jan 13 '13 at 15:45
  • @TomKealy: I've made another small adjustment; the output writer was also not using a open file object. Try printing the lines to see if the input is still matching what you posted in your question; adding a `print` after the `if match is not None` can help to see if the regular expression actually matched anything, etc. – Martijn Pieters Jan 13 '13 at 18:16
  • Thanks for this: you've made my evening significantly easier! – Tom Kealy Jan 13 '13 at 18:55