4

On my Windows machine I have a tiny script (.bat) to start a number of programs I use for my ordinary work such as Word, Outlook, a certain Excel file, etc.

An unsolved problem is that I cannot specify a certain Excel file that is stored in a local folder because the folder has a special character (German u-umlaut, i.e. ü) in its name. Something like:

C:\Büroeinrichtung\MyExcelFile.xlsx

In my script I try to call this via

Start "" "C:\Büroeinrichtung\MyExcelFile.xlsx"

but on running the script I get an alert indicating that the ü is not accepted (where the ü is replaced by some even more fancy signs on screen).

I can change the folder name by replacing ü by ue or whatever to circumvent the problem, but I would rather like a solution that allows leaving my existing folder names unchanged.

Didn't find the problem addressed in other questions here.

Is there a solution?

PS: I use Notepad++.

Later addition: There is a follow-up problem that still haunts me (although the initial problem is solved). It is regarding the approach to change the codepage programmatically with the chcp command from inside the script. See the question here

  • 2
    Look at [CMD can't read danish characters when i execute .bat file](https://stackoverflow.com/questions/43046559/). It is the same text encoding issue. You wrote the batch file in Notepad++ using [Windows-1252](https://en.wikipedia.org/wiki/Windows-1252), but in console the code page [OEM 850](https://en.wikipedia.org/wiki/Code_page_850) is used by default with German set in Windows region and language settings. The umlauts have different code values in OEM 850 in comparison to Windows-1252. – Mofi May 31 '17 at 09:50

2 Answers2

3

The problem are the different charsets: ANSI and ASCII. There exists different solutions:

  • Use an other editor with the possibility to change the charset.
  • Use the prompt and copy the Umlaut to your editor.
  • Use the CMD, goto your directory and use dir /x to get the shortname from Büroeinrichtung and use this in your DOS Script.
Sascha
  • 4,576
  • 3
  • 13
  • 34
  • 1
    If cmd is attached to a console, batch files are decoded line by line using the console's codepage. If you save in ANSI, you'll have to temporarily change the console to use the ANSI codepage around the line in question via `chcp.com [codepage number]`. The console's default is the OEM codepage. – Eryk Sun May 31 '17 at 09:22
  • @eryksun Thanks. How can I find out what codepage number I have to set? And would I have to reverse this afterwards? – Christian Geiselmann May 31 '17 at 09:24
  • UTF-8 without a BOM is also an option, but the console is buggy with codepage 65001, so I'd put all of the non-ASCII strings in a section that temporarily switches to codepage 65001 to load the strings as environment variables, at which point they're Unicode in cmd, and then switches back to OEM. – Eryk Sun May 31 '17 at 09:25
  • Or switch back to the original codepage that you saved in a `codepage` environment variable, e.g. `for /f "tokens=2 delims=:" %%c in ('chcp.com') do @set codepage=%%c`. – Eryk Sun May 31 '17 at 09:31
  • Here my conclusive report: **a)** The method with the shortname was the easiest to implement, but somewhat unelegant. **b)** I tried to copy my ü from the cmd screen but found that I do not know how **c)** I made some experiments with changing codepages using the chcp command in the script but run into other problems **d)** The solution I finally liked best was setting the encoding of the script (written in Notepad++) to the code page that is used by default in the cmd window: OEM-850 (as became clear on typing *chcp* in the command window). - Thank you, everybody. Problem solved. – Christian Geiselmann May 31 '17 at 09:55
  • 1
    @ChristianGeiselmann To copy text from a console window into clipboard, right click into the console window to open its context menu and left click on first menu item __Mark__. Now you can make a rectangular selection with mouse pointer or using Shift+Arrow keys. Once the selection is made hit key RETURN or ENTER to copy the selected text to clipboard. – Mofi May 31 '17 at 09:58
  • 1
    Copying the umlaut character from console via clipboard to GUI text editor does not work as the OEM 850 encoded character is automatically converted on paste from clipboard to Windows-1252. The solution is redirecting the output of __DIR__ into a text file being OEM 850 encoded and open this file in GUI text editor for copying the OEM 850 encoded umlaut into the batch file. – Mofi May 31 '17 at 10:03
  • @ChristianGeiselmann, but it's not a good option for people who have to work in multiple codepages or move files between systems in different locales. I don't know what problems you had with saving and changing the codepage, but it works fine for me. Using UTF-8 (no BOM) allows the full range of Unicode instead of being stuck with legacy codepages from the 1980s that make scripts non-portable across locales. – Eryk Sun May 31 '17 at 10:04
  • 1
    @Mofi, even a half-way decent programming editor lets you set the file encoding. Copying from the console is Unicode, not OEM, and it gets pasted into the editor as Unicode, which can then encode to OEM or whatever you want when saving the file. – Eryk Sun May 31 '17 at 10:07
  • @erkysun. I suppose it would be rather a topic for a separate question, but here is in short my experience with trying to change code pages programmatically in the script: **Line 1:** *chcp 65001* nicely did its job. **Line2:** *Start "" "C:\... and so on"* **made the cmd window disappear (crash?)**. I tried it step by step by inserting *pause* after each line, but to no different result (but of course that I had to hit *enter* after Line 1). – Christian Geiselmann May 31 '17 at 10:16
  • @eryksun I have not written that text copied from console into clipboard is OEM encoded. I have written that on paste into a Windows-1252 encoded file the text is pasted Windows-1252 encoded. I don't know in which [clipboard formats](https://msdn.microsoft.com/en-us/library/windows/desktop/ff729168.aspx) Windows copies the selected text in console window to the Windows clipboard. That does not really matter. It matters only in which encoding the copied text is pasted from clipboard into text file opened in text editor. – Mofi May 31 '17 at 10:36
  • 1
    @Mofi, you said "the OEM 850 encoded character is automatically converted on paste from clipboard to Windows-1252". That's two claims -- that it's an "OEM 850 character", which it is not at any time (the console screen buffer is Unicode), and that it gets converted to ANSI 1252 in the editor, which again is not what happens. It's copied as Unicode and pasted as Unicode assuming we're not using some Windows 9x editor from the dark ages that only supports ANSI text. – Eryk Sun May 31 '17 at 10:42
  • @ChristianGeiselmann, offhand I don't know why that crashed -- sorry. You're just starting an Excel file. When I said the console is buggy in UTF-8, I meant for running external console programs that use the legacy ANSI API (e.g. `ReadConsoleA`) . cmd.exe itself uses the console's Unicode API (e.g. `ReadConsoleW`) and only uses the console codepage to determine how it should encode text written to files and pipes (unless `/u` is used) and for decoding the content of batch files. – Eryk Sun May 31 '17 at 10:52
0

The previously suggestion solution was not working my case (i.e. using Hungarian special charachters), but it gave me in idea:

I used the dir /x to write into file:

dir <path> /x >>dir.txt

Using that I found the short folder name.

William Baker Morrison
  • 1,642
  • 4
  • 21
  • 33
Atilla
  • 1