0

I have a path in a variable like that:

path = "C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif"

Which is incorrect because it contains escape sequences:

>>> path
'C:\\HT_Projeler\x07\\Kaynak\\wrapped_gedizw.tif'

How can I fix the path in this variable so it becomes equivalent to r"C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif" or "C:/HT_Projeler/7/Kaynak/wrapped_gedizw.tif"?

I know the topic is common and I investigated many questions (1,2 etc.) in here.

ADD

Here is my exact script:

...
basinFile = self._gv.basinFile
basinDs = gdal.Open(basinFile, gdal.GA_ReadOnly)
basinNumberRows = basinDs.RasterYSize
basinNumberCols = basinDs.RasterXSize
...

In here self._gv.basinFile consists my path. So I cannot put "r" beginngin of self._gv.basinFile

Mustafa Uçar
  • 442
  • 1
  • 6
  • 18

2 Answers2

5

If you insert paths in Python code, just use raw strings, as other have suggested.

If instead that string is out of your control, there's not much you can do "after the fact". Escape sequences conversion is not injective, so, given a string where escape sequences have already been processed, you cannot "go back" univocally. IOW, if someone incorrectly writes:

path = "C:\HT_Projeler\7\Kaynak\wrapped_gedizw.tif"

as you show, you get

'C:\\HT_Projeler\x07\\Kaynak\\wrapped_gedizw.tif'

and there's no way to guess surely "what they meant", because that \x07 may have been written as \7, or \x07, or \a. Heck, any letter may have been originally written as an escape sequence - what you see in that string as an a may have actually been \x61.

Long story short: your caller is responsible for giving you correct data. Once it's corrupted there's no way to come back.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • 1
    Thank you, finally I get a sensible answer. 7 is here project name. I think project name must begin with letter, not number. – Mustafa Uçar May 29 '18 at 07:04
  • 1
    @MustafaUçar - the other answers and comments were because you did not tell the full story first time around and made several edits. – cdarke May 29 '18 at 07:06
  • @MustafaUçar: you may think what you want, but you cannot be sure, and once it's broken, it's broken. The correct thing to do here is to **avoid corrupting data in first place**. Who is that writes that `path = ` statement in first place (or, in your actual code, initializes `self._gv.basinFile`)? – Matteo Italia May 29 '18 at 07:23
1

In the general case, there is no way to tell whether a character in a path is correct or not without externally checking the actual paths on your computer (and "special character" is not really well-defined; how do you know that the path wasn't \0x41 which got converted to A anyway?)

As a weak heuristic, you could look for path names within a particular editing distance, for example.

import os
from difflib import SequenceMatcher as similarity  # or whatever

path_components = os.path.split(variable)
path = ''
for p in path_components:
    npath = os.path.join(path, p)
    if not os.path.exists(npath):
        similar = reversed(sorted([(similarity(None, x, p).ratio(), x) in os.listdir(npath)]))
        # recurse on most similar, second most similar, etc?  or something
    path = npath
tripleee
  • 175,061
  • 34
  • 275
  • 318