0

I have a small script that extracts data from a csv file. the file is unorganised and I have about 1000 of them to get the data from so far so I can't simply edit the format. I created a script the reads each line one by one skipping all the useless data then reading whats left, I needed to remove the first 36 characters and the last 3. However this prints inacurate data for some reason

My Code

import sys
import time
from sys import argv
while True:
    argv1 = "ex.csv"
    script, filename = argv, argv1
    f = open(filename, 'r')
    for i, line in enumerate(f):
        print (line)[36::3]
        print (i)
    time.sleep(5)

My first 2 lines of data are pretty empty so ignoring those here is the next line from ex.csv

20/03/2015  10:28:26, 390114.322299, 393732.492744, 0

Using the above code when printing (line) I get the data 37.240 I do not understand where it is getting this number from. as far as I understand from what I have been learning its suppose to skip [start:middle:end] so it should skip 36 from the start and 3 from the end and non in the middle.

Also not all ranges of data are set at 11 characters so I can't skip all but 11 characters either.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I manually fixed your indentation; can you verify that it is now correct? Or should the the `time.sleep()` line also be indented? – Martijn Pieters Mar 24 '15 at 10:39
  • You have misunderstood the Python slice notation. I duped this to the canonical post, no need to explain it *again*, but you need to use `[36:-3]` here to remove the first 36 characters, as well as the last 3. Instead of slicing the line, I'd slice the *row* produced by the `csv` module. You can then use `row[1:-1]` to skip the first and last *column*. – Martijn Pieters Mar 24 '15 at 10:41
  • again not sure what :-3 does (i understand what your trying to say it does) but i changed tit to -1 which should remove the last character however it just decided to invert the entire line and read it backwards. im looking into the other question posted to try get a better understanding of this. also im not sure if time() should be indented differently but it works where it is right now. – user3310078 Mar 24 '15 at 10:56
  • Negative numbers are subtracted from the sequence length. `-3` means *three elements from the end of this sequence*. – Martijn Pieters Mar 24 '15 at 10:57
  • i got it. looks like i was putting a colon in the wrong place it should of been [36:-3:] where i was utting [36::-3] and [36:-3] – user3310078 Mar 24 '15 at 11:15
  • `[36:-3]` works and functionally the same thing as `[36:-3:]`; the last colon is optional. – Martijn Pieters Mar 24 '15 at 11:21

0 Answers0