1

You can easly execute a system command with C with this answers.

But, how can I input a command with non-ASCII character? If I run this program:

#include <stdlib.h>    
int main() {
    system("echo \"Coração, fé, café!\" ");
}

I get:

"Coração, fé, café!"

I don't want to convert the terminal to a correct charset (or encodding). I want to input the non-ASCII characters.

Let's say I have the file:

C:\Cafe Coracao.txt     <-- No ascents
C:\Café Coração.txt

So this program:

#include <stdlib.h>    
int main() {
    system("type \"C:\\Cafe Coracao.txt\"");    <--  Will work
    system("type \"C:\\Café Coração.txt\"");    <--  Will fail
}

How to make this work?

Rodrigo
  • 11,909
  • 23
  • 68
  • 101
  • [Unicode Escape Seq](https://dencode.com/en/string/unicode-escape), there is also a text converter. [This](https://godbolt.org) works on my Macintosh, could be diff fro your machine. I used `printf()` instead of `system` but should still work. – underloaded_operator Jun 17 '23 at 21:03
  • 1
    On Windows you would likely need to use the [_wsystem](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/system-wsystem?view=msvc-170) function with a wide character string. – Retired Ninja Jun 17 '23 at 21:06
  • 2
    Given the output, the source file is encoded in UTF-8. With the Microsoft compiler, using the `/utf-8` compiler switch to set the source encoding to UTF-8 works with your code. I don't have `clang` installed, but try compiling with `-finput-charset=utf8`. Note to @RetiredNinja it works with both `system` and `_wsystem` if the source encoding is interpreted correctly, at least on the MS compiler. – Mark Tolonen Jun 17 '23 at 21:26
  • 1
    @RetiredNinja that's not true. You can use UTF-8 on the console. And nowadays [UTF-8 locale is also supported](https://stackoverflow.com/a/63454192/995714) – phuclv Jun 18 '23 at 04:58
  • What is the encoding of the c source file? – Neil Jun 18 '23 at 10:26

2 Answers2

0
#define _CRT_SECURE_NO_WARNINGS
#include <stdlib.h>    
#include <stdio.h>

int main(void) 
{
    {
        FILE* f = fopen("C:\\Temp\\Café Coração.txt", "w");
        if (f)
        {
            fputs("Here's some text\n", f);
            fclose(f);
            system("type \"C:\\Temp\\Café Coração.txt\"");
        }
        else
        {
            fprintf(stderr, "Could not create utf-8 file!\n");
        }
    }

    {
        FILE* f = _wfopen(L"C:\\Temp\\Café CoraçãoW.txt", L"w");
        if (f)
        {
            fputs("Here's some text\n", f);
            fclose(f);
            _wsystem(L"type \"C:\\Temp\\Café CoraçãoW.txt\"");
        }
        else
        {
            fprintf(stderr, "Could not create utf-16 file!\n");
        }
    }

    return 0;
}

Notepad++ identifies the source file as UTF-8 BOM. Running it on Windows 10 Pro 22H2 19045.3086

Compiled with Visual Studio 2022 using the Microsoft compiler with no special flags both files are created with the correct names and type works on both of them.

Compiled with clang 15.0.1 installed with Visual Studio using the command line clang -finput-charset=utf-8 test.c -o tt.exe the first file is created and incorrectly named Café Coração.txt, the second is correctly named Café CoraçãoW.txt. Both of the type commands work, which I'd expect since it's the same string literal for the name, but if the file already exists with the correct name then type fails since the names don't match. The results are the same without the -finput-charset=utf-8 flag. I also tried the -fexec-charset=utf-8 flag with no change to the behavior.

I'd say you're best off using wide strings for filenames/paths if you're compiling with clang unless there are other options to make it work that I am not aware of.

Retired Ninja
  • 4,785
  • 3
  • 25
  • 35
0

For one hand, in this question, I tried to compile the code. But what I was really trying was to get the string from console and then work with it.

This is a really old DOS problem. Anyway, if you run this code:

int main() {
    FILE *f, *file;
    file = fopen("resultado.txt","w");
    for (unsigned int i=0; i<256; i++) {
        char result[60];
        snprintf(result, 12, "echo %c %i", (char)i, i);
        f = popen(result, "r");
        char linha[512];
        while (fgets(linha, 512, f)) {
            fprintf(file, "%c %i --> C  ", (char)i,  (int)i);
            fprintf(file, "%s", linha);
        }
    }
    fclose(file);
    return 0;
}

It will print in the text all ASCII code and how the console see it. From 0 to 127 it is equal. After it, you need to change. This is Code Page 850 and I don't see an easy way to do it, but I did it myself.

Getting all printable character, I have this array:

const unsigned int translate[] = {199,252,233,226,228,224,229,231,234,235,232,239,238,236,196,197,201,230,198,244,246,242,251,249,255,214,220,248,163,216,215,131,225,237,243,250,241,209,170,186,191,174,172,189,188,161,171,187,0,0,0,0,0,193,194,192,169,0,0,0,0,162,165,0,0,0,0,0,0,0,227,195,0,0,0,0,0,0,0,164,240,208,202,203,200,0,205,206,207,0,0,0,0,166,204,0,211,223,212,210,245,213,181,254,222,218,219,217,253,221,175,180,173,177,0,190,182,167,247,184,176,168,183,185,179,178,0};

Now loop all string you get and use

for (int i= 0; i< size; i++) {
   unsigned char mychar = (unsigned char)text[i];
   if (mychar>= 128){
         mychar =translate[mychar - 128];      
   }
   text[i] = mychar
}

And now you can work with the string in normal ASCII in C program.

Rodrigo
  • 11,909
  • 23
  • 68
  • 101