0

Well, I'm trying to clean my file which have codes for french accents:

#353= IFCPROPERTYSINGLEVALUE('Charge d''\X2\00E9\X0\clairage sp\X2\00E9\X0\cifi\X2\00E9\X0\e par surface',$,IFCREAL(10.7639104167097),$);

I created this little function:

def CleanSpace(sp):
    sp.replace("\X2\00F4\X0\","ô")
    sp.replace("\X2\00E9\X0\","é")
    return(sp)

but Python 3 gave me the error:

    sp.replace("\X2\00F4\X0\","ô")
                               ^
SyntaxError: invalid syntax

How can I resolve this, please? Thanks in advance

Edit: if it can help, I rather tryed this line in console but answer was strange:

$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a='02_RADIOTHERAPIE/ ARC -plateforme recherche- Radioth\X2\00E9\X0\rapie'
>>> a
'02_RADIOTHERAPIE/ ARC -plateforme recherche- Radioth\\X2\x00E9\\X0\rapie'
>>> a.replace('\X2\00E9\X0\\','é')
'02_RADIOTHERAPIE/ ARC -plateforme recherche- Radioth\\X2\x00E9\\X0\rapie'
Pim92
  • 99
  • 1
  • 5
  • 15
  • 1
    The problem is `\"` in the end of the first argument - it's interpreting the " as part of the string, instead of closing it. Drop that `\\`. The coloring should give you the hint. – kabanus Apr 10 '18 at 14:21
  • Also you do not need to do this like that tell it that the input string is in unicode escape and that you want to produce a string that is UTF-8. That way you do not need to figure out all the unicode escape sequences you let Python figure it out for you :) https://stackoverflow.com/questions/11375684/python-how-to-convert-utf-8-code-string-back-to-string – Rob Apr 10 '18 at 14:24
  • @kabanus: Thanks, yes, coloring gave me a doubt but if I drop or double this final \, no error but function 'replace' didn't worked: my prints are containing always \X2\00E9\X0\... – Pim92 Apr 10 '18 at 14:59
  • @Rob: I don't understand this link, too much technical for me – Pim92 Apr 10 '18 at 15:14
  • 1
    I don't understand - you want to take a regular string, and convert it to the appropriate unicode character? `\X2\00F4\X0` and the symbol you wrote are the same thing, I don't understand what you're trying to do. In any case, you have to drop that final backslash, that's just plain wrong, regardless. – kabanus Apr 10 '18 at 15:37
  • @pim92 Sorry I wrote that quickly, it looked a lot like a byte array describing 'unicode-escape' sequence that you could use python to decode.. it is not however so disregard my comment. – Rob Apr 10 '18 at 17:44
  • @Rob: ok, no prob ;) @kabanus: No, I have a file with some special texts like `\X2\00F4\X0` and I would like to transform these lines with something like sp.replace but or it doesn't work, or there is the error message... – Pim92 Apr 11 '18 at 07:33

3 Answers3

1

the \ character escapes your quotations. This means python will keep on going until it finds another quotation mark to end your string. so, in reality, your string is \X2\00F4\X0\", To fix this, escape the \ with an \ or remove the final one entirely. new code:

sp.replace("\X2\00F4\X0\\","ô")
rat1221
  • 21
  • 6
0

When you put \“ in a python string, it adds a literal to the string and doesn’t close it. In the same way you can write \‘ to get a single quote, or \\ to get a backslash. So if I wanted to get a Python string saying:

“Hi,”, said Bob \

I wouuld write in my code:

“\“Hi,\“, said Bob \\”

Because you wrote \” and didn’t close the string after it, it carried on to the next line and messed everything up.

Edit:

Also, in the console you didn’t use double backslashes everywhere, so occasionally they acted as escape characters resulting in strange things. Whenever you want a string to contain a backslash in Python, write \\.

Your text might contain \X2\00F4\X0\ , but in a Python string "\\" means a single backslash, so if you replace every backslash in your string literals with a double backslash (not just the last one), it should work, so

a.replace('\\X2\\00E9\\X0\\','é')

for example.

DarthVlader
  • 344
  • 3
  • 14
0

Well, after a lot of tries and searches, solution for one line was to use raw-strings:

>>> a.replace(r'\X2\00E9\X0\ '[:-1], 'é')
"#353= IFCPROPERTYSINGLEVALUE('Charge d''éclairage spécifiée par surface',$,IFCREAL(10.7639104167097),$);"

For more lines, it was more difficult because bytes into my file are already written and it is not because I see a '\' that it is existing... Solution found for me was to work on bytes with antlr4

Pim92
  • 99
  • 1
  • 5
  • 15