Get list of processes on Windows in a charset-safe way

Question

This post gives a solution to retrieve the list of running processes under Windows. In essence it does:

String cmd = System.getenv("windir") + "\\system32\\" + "tasklist.exe";
Process p = Runtime.getRuntime().exec(cmd);
InputStreamReader isr = new InputStreamReader(p.getInputStream());
BufferedReader input = new BufferedReader(isr);

then reads the input.

It looks and works great but I was wondering if there is a possibility that the charset used by tasklist might not be the default charset and that this call could fail?

For example this other question about a different executable shows that it could cause some issues.

If that is the case, is there a way to determine what the appropriate charset would be?

@JimGarrison I got a warning from FindBugs about *"reliance on default encoding"* in the InputStreamReader and I have no idea if this could cause an issue or not. So I searched and found the second post that seems to say that it could. That's what I want to check. On my machine that code works fine. — assylias, Nov 12 '12 at 18:12
I'll add this as a comment rather than a question because my uncertainty is fairly large. That said, I would think that the character set used by a system utility like that would be that of the default locale for the OS installation. Querying for that locale and using it to interpret the output stream would seem to be the most general approach. But if there are localizations also present, you'd need to reverse-engineering the fields that could change so as to parse them out. And this is all dependent upon whether the utility in question was written to vary this way in the first place. — eh9, Nov 16 '12 at 19:31

score 12 · Accepted Answer · answered Nov 20 '12 at 23:01

Can break this into 2 parts:

The windows part
From java you're executing a Windows command - externally to the jvm in "Windows land". When java Runtime class executes a windows command, it uses the DLL for consoles & so appears to windows as if the command is running in a console
Q: When I run C:\windows\system32\tasklist.exe in a console, what is the character encoding ("code page" in windows terminology) of the result?
- windows "chcp" command with no argument gives the active code page number for the console (e.g. 850 for Multilingual-Latin-1, 1252 for Latin-1). See Windows Microsoft Code Pages, Windows OEM Code Pages, Windows ISO Code Pages
  The default system code page is originally setup according to your system locale (type systeminfo to see this or Control Panel-> Region and Language).
- the windows OS/.NET function getACP() also gives this info
The java part:
How do I decode a java byte stream from the windows code page of "x" (e.g. 850 or 1252)?
- the full mapping between windows code page numbers and equivalent java charset names can be derived from here - Code Page Identifiers (Windows)
- However, in practice one of the following prefixes can be added to achieve the mapping:
  "" (none) for ISO, "IBM" or "x-IBM" for OEM, "windows-" OR "x-windows-" for Microsoft/Windows.
  E.g. ISO-8859-1 or IBM850 or windows-1252

Full Solution:

    String cmd = System.getenv("windir") + "\\system32\\" + "chcp.com";
    Process p = Runtime.getRuntime().exec(cmd);
    // Use default charset here - only want digits which are "core UTF8/UTF16"; 
    // ignore text preceding ":"
    String windowsCodePage = new Scanner(
        new InputStreamReader(p.getInputStream())).skip(".*:").next();

    Charset charset = null;
    String[] charsetPrefixes = 
        new String[] {"","windows-","x-windows-","IBM","x-IBM"};
    for (String charsetPrefix : charsetPrefixes) {
        try {
            charset = Charset.forName(charsetPrefix+windowsCodePage);
            break;
        } catch (Throwable t) {
        }
    }
    // If no match found, use default charset
    if (charset == null) charset = Charset.defaultCharset();

    cmd = System.getenv("windir") + "\\system32\\" + "tasklist.exe";
    p = Runtime.getRuntime().exec(cmd);
    InputStreamReader isr = new InputStreamReader(p.getInputStream(), charset);
    BufferedReader input = new BufferedReader(isr);

    // Debugging output
    System.out.println("matched codepage "+windowsCodePage+" to charset name:"+
            charset.name()+" displayName:"+charset.displayName());
    String line;
    while ((line = input.readLine()) != null) {
           System.out.println(line);
    }

Thanks for the Q! - was fun.

This is great - I copied the `notepad.exe` application and remaned it to `0aéèçê.exe` and launched it. My original code failed (showing square characters). Your version did output the right string (with codepage 850). — assylias, Nov 21 '12 at 13:48

Alexey Ivanov · Answer 2 · 2012-11-20T09:05:51.350

Actually, the charset used by tasklist is always different from the system default.

On the other hand, it's quite safe to use the default as long as the output is limited to ASCII. Usually executable modules have only ASCII characters in their names.

So to get the correct Strings, you have to convert (ANSI) Windows code page to OEM code page, and pass the latter as charset to InputStreamReader.

It seems there's no comprehensive mapping between the these encodings. The following mapping can be used:

Map<String, String> ansi2oem = new HashMap<String, String>();
ansi2oem.put("windows-1250", "IBM852");
ansi2oem.put("windows-1251", "IBM866");
ansi2oem.put("windows-1252", "IBM850");
ansi2oem.put("windows-1253", "IBM869");

Charset charset = Charset.defaultCharset();
String streamCharset = ansi2oem.get(charset.name());
if (streamCharset) {
    streamCharset = charset.name();
}
InputStreamReader isr = new InputStreamReader(p.getInputStream(),
                                              streamCharset);

This approach worked for me with windows-1251 and IBM866 pair.

To get the current OEM encoding used by Windows, you can use GetOEMCP function. The return value depends on Language for non-Unicode programs setting on Administrative tab in Region and Language control panel. Reboot is required to apply the change.

There are two kinds of encodings on Windows: ANSI and OEM.

The former is used by non-Unicode applications running in GUI mode.
The latter is used by Console applications. Console applications cannot display characters that cannot be represented in the current OEM encoding.

Since tasklist is console mode application, its output is always in the current OEM encoding.

For English systems, the pair is usually Windows-1252 and CP850.

As I am in Russia, my system has the following encodings: Windows-1251 and CP866.
If I capture output of tasklist into a file, the file can't display Cyrillic characters correctly:

I get ЏаЁўҐв instead of Привет (Hi!) when viewed in Notepad.
And µTorrent is displayed as зTorrent.

You cannot change the encoding used by tasklist.

However it's possible to change the output encoding of cmd. If you pass /u switch to it, it will output everything in UTF-16 encoding.

cmd /c echo Hi>echo.txt

The size of echo.txt is 4 bytes: two bytes for Hi and two bytes for new line (\r and \n).

cmd /u /c echo Hi>echo.txt

Now the size of echo.txt is 8 bytes: each character is represented with two bytes.

Thank you for your detailed and informative answer - I find Glen Best's answer better in the sense that it provides a full working example so I selected it but yours was very good too. — assylias, Nov 21 '12 at 13:50

score 3 · Answer 3 · edited May 23 '17 at 12:22

Why not use the Windows API via JNA, instead of spawning processes? Like this:

import com.sun.jna.platform.win32.Kernel32;
import com.sun.jna.platform.win32.Tlhelp32;
import com.sun.jna.platform.win32.WinDef;
import com.sun.jna.platform.win32.WinNT;
import com.sun.jna.win32.W32APIOptions;
import com.sun.jna.Native; 

public class ListProcesses {
    public static void main(String[] args) {
        Kernel32 kernel32 = (Kernel32) Native.loadLibrary(Kernel32.class, W32APIOptions.UNICODE_OPTIONS);
        Tlhelp32.PROCESSENTRY32.ByReference processEntry = new Tlhelp32.PROCESSENTRY32.ByReference();          

        WinNT.HANDLE snapshot = kernel32.CreateToolhelp32Snapshot(Tlhelp32.TH32CS_SNAPPROCESS, new WinDef.DWORD(0));
        try  {
            while (kernel32.Process32Next(snapshot, processEntry)) {             
                System.out.println(processEntry.th32ProcessID + "\t" + Native.toString(processEntry.szExeFile));
            }
        }
        finally {
            kernel32.CloseHandle(snapshot);
        }
    } 
}

I posted a similar answer elsewhere.

The above only outputs the command name and NOT the entire command line. Is there to get the process full command line? — Christopher Dancy, May 04 '13 at 19:38

javabeats · Answer 4 · 2012-11-14T15:35:46.990

0

There is a much better way to check the running processes, or even to run OS command through java: Process and ProcessBuilder.

As for the Charset, you can always inquire the OS about the supported charsets, and obtain an Encoder or Decoder according to your needs.

[Edit] Let's break it down; there's no way of knowing in which encoding the bytes of a given String are, so your only choice is to get those bytes, shift the ordering as necessary (if you're ever in such an environment where a process can give you an array of bytes in different ordering, use ByteBuffer to deal with that), and use the multiple CharsetDecoders supported to decode the bytes to reasonable output.

It is overkill and requires you to estimate that a given output could be in UTF-8, UTF-16 or any other encoding. But at least you can decode the given output using one of the possible Charsets, and then try to use the processed output for your needs.

Since we're talking about a process run by the same OS in which the JVM itself is running, it is quite possible that your output will be in one of the Charset encodings returned by the availableCharsets() method.

edited Nov 14 '12 at 15:35

answered Nov 12 '12 at 19:00

javabeats

1,082
3
12
26

I am already using a Process and I know how to specify a charset. The question is: which charset to use. You state "*you can always inquire the OS about the supported charsets*": how do you do that? How do I know which of the supported charsets is used by that specific program? – assylias Nov 12 '12 at 19:09
You're using a Process, but not a ProcessBuilder, which is cleaner than using the Runtime class. The actual method you need to call to obtain the available charsets is Charset.availableCharsets(). But even so, it would be safer to test a Charset using the methods in the javadocs I gave you - CharsetEncoder.canEncode(), detect(), etc... – javabeats Nov 13 '12 at 13:43
I'm sorry but I don't understand how that would work. Could you give a simple example of how you would apply your recommendation to my specific use case? – assylias Nov 13 '12 at 14:18
1

improved the answer a bit to explain my point of view on the issue; the code involved should be easy to imagine by now, I could give a simple example by then again it would be of little use - first you need to know if what I am proposing will fit your needs, or not. – javabeats Nov 14 '12 at 15:36
1

How do I check the running processes with `Process`? It's only purpose is to represent a (sub)process created from Java via `ProcessBuilder.start()` or `Runtime.exec()`. – Alexey Ivanov Nov 20 '12 at 09:02
What I had in mind was to use Process to issue a "ps"-like command to the OS; but that is obviously not platform-independent. It'd be cool if Java provided a way to query the OS about certain native capabilities, as we can do through certain OpenGL APIs. – javabeats Nov 21 '12 at 12:56
@javabeats See the accepted answer which gives a way (on Windows) to find the charset used by the console. – assylias Nov 21 '12 at 13:55

Get list of processes on Windows in a charset-safe way

4 Answers4

Linked