1

I am processing, with python, a long list of data that looks like this

data screenshot

The digraphs are probably due to encoding problems. (I am not sure whether these characters will be preserved in this site)

29/07/2016 04:00:12 0.125143    

Now, when I read such file into a script using something like open and readlines, there is an error, reading

SyntaxError: EOL while scanning string literal

I know (or may look up usage of) replace and regex functions, but I cannot do them in my script. The biggest problem is that anywhere I include or read such strange character, error occurs, pointing on the very line it is read. So I cannot do anything to them.

Violapterin
  • 337
  • 2
  • 14
  • these might help you https://stackoverflow.com/questions/64749/m-character-at-end-of-lines https://stackoverflow.com/questions/16695950/how-to-read-windows-file-in-linux-environment – Equinox Jul 12 '17 at 08:31

2 Answers2

1

Are you reading a file? If so, try to extract values using regexps, not to remove extra characters:

re.search(r'^([\d/: ]{19})', line).group(1)
re.search(r'([\d.]{7})', line).group(1)
bakatrouble
  • 1,746
  • 13
  • 19
  • Thank you for giving more information, but sorry I don't have time to thoroughly test this (but I have upvoted you). – Violapterin Jul 28 '17 at 16:06
0

I find that the re.findall works. (I am sorry I do not have time to test all other methods, since the significance of this job has vanished, and I even forget this question itself.)

def extract_numbers(str_i):
   pat="(\d+)/(\d+)/(\d+)\D*(\d+):(\d+):(\d+)\D*(\d+)\.(\d+)"
   match_h = re.findall(pat, str_i)
   return match_h[0]

# ....
# `f` is the handle of the file in question
lines =f.readlines()
for l in lines:
   ls_f =extract_numbers(l)
   # process them....
Violapterin
  • 337
  • 2
  • 14