0

I have a .txt file that looks like this but much longer:

Image0001_01.tif[1] <- Image0035_01.tif[1]: (410.0, -362.0) correlation (R)=0.05516124 (176 ms)
Image0001_01.tif[1] <- Image0002_01.tif[1]: (489.0, -495.0) correlation (R)=0.047715914 (287 ms)
Image0002_01.tif[1] <- Image0003_01.tif[1]: (647.0, 0.0) correlation (R)=0.8842946 (295 ms)
Image0001_01.tif[1] <- Image0036_01.tif[1]: (265.0, -363.0) correlation (R)=0.039207384 (365 ms)
Image0002_01.tif[1] <- Image0034_01.tif[1]: (626.0, -626.0) correlation (R)=0.60634625 (124 ms)
...........

I'd like to turn this into a comma separated file (csv) so that I can look at the correlations (R-values) but running into problems because of the weird formatting of this file. Is there a way I can do this in Python?

Taku
  • 31,927
  • 11
  • 74
  • 85
V_ix
  • 163
  • 1
  • 6
  • 1
    what to you want as output? what have you tried? please help us help you :) looks like a simple use of `re` – R Nar Nov 16 '15 at 19:53

1 Answers1

1

Use re and csv in python to parse your file and convert it to a csv file:

import re
import csv

re_expression = '^(.*?) <- (.*?): \((.*?), (.*?)\) correlation \(R\)=(.*?) \((.*?) ms\)$'

with open('output.csv', 'w', newline='') as csvfile:
    outfile = csv.writer(csvfile)
    with open('input.txt') as f:
        while True:
            line = f.readline()
            if not line: break
            m = re.split(re_expression, line)
            outfile.writerow(m[1:-1])
kponz
  • 508
  • 3
  • 7
  • Thanks for your input. Running the code I get the following error:ValueError Traceback (most recent call last) ---> 10 m = re.split(re_expression, f.readlines()) 11 outfile.writerow(m[1:-1]) 12 ValueError: Mixing iteration and read methods would lose data – V_ix Nov 18 '15 at 18:39
  • I've updated my answer to prevent this problem (see also http://stackoverflow.com/questions/826493/python-mixing-files-and-loops) – kponz Nov 18 '15 at 18:58
  • ---> 10 line = original.readline() NameError: name 'original' is not defined – V_ix Nov 18 '15 at 19:21
  • very strange: now I get this error: ---> 10 line = original.readline() "I/O operation on closed file" – V_ix Nov 18 '15 at 19:30
  • Should be fixed now, the variables weren't correct. – kponz Nov 18 '15 at 20:21
  • Thanks, I replaced rb with wb in the csvfile line as well as added rb after 'input.txt' because it was throwing a can't write to file error. Now the code throws no errors but when I open the output.csv it has actually not done anything to the file. – V_ix Nov 18 '15 at 21:14
  • Take a look now - the open should be just 'w' with the newline option as well to avoid extra line breaks. – kponz Nov 18 '15 at 21:39
  • TypeError: 'newline' is an invalid keyword argument for this function – V_ix Nov 18 '15 at 22:04
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/95495/discussion-between-kponz-and-v-ix). – kponz Nov 18 '15 at 22:08