0

I'm rewriting a program in C++ to see if i can improve the speed, and i need to convert a char[] to int[] with the character values as in Java, as you can see, i write this code in Java:

public static void main(String[] args) {
    System.out.println("Insert text:");
    Scanner D = new Scanner(System.in);
    String text = D.nextLine();
    int[] textInt = StringToIntArray(text);
    printArray(textInt);
}

public static int charNum(char x){
    int a = x;
    return a;
}

public static int[] StringToIntArray(String text){
    int[] result = new int[text.length()];
    for (int i = 0; i < result.length; i++) {
        result[i] = charNum(text.charAt(i));
    }
    return result;
}

public static void printArray(int[] x){
    for(int i : x){
        System.out.print("["+i+"] ");
    }
    System.out.println("");
}

(If you input Hello it will print [72] [101] [108] [108] [111])

But i just get noticed that in c++ the default char format is ANSI and someone tell me that Java uses UTF-16. I just need to convert text even from char[] or std::string to an int[] but i really need the same values

Ulkra
  • 13
  • 5
  • 1
    Possible duplicate of [Encode/Decode std::string to UTF-16](https://stackoverflow.com/questions/11086183/encode-decode-stdstring-to-utf-16) – NicoBerrogorry Feb 11 '18 at 04:47

2 Answers2

0

(If you input Hello it will print [72] [101] [108] [108] [111])

If I understand you correctly, you just want to decompose a string into the integer values of each of its characters.

If that is the case, then it is fairly simple to convert a character string into integers:

std::string s = "Hello";

std::cout << s << ": ";

for( auto ch : s ) {

    int i = static_cast<int>(ch);

    std::cout << "[" << i << "] ";
}

std::cout << std::endl;

From this I get:

Hello: [72] [101] [108] [108] [111]

Here is a modified version for the UTF-16 case:

std::u16string s = u"Hello";

for( auto ch : s ) {

    int i = static_cast<int>(ch);

    std::cout << "[" << i << "] ";
}

std::cout << std::endl;
Daniel
  • 1,291
  • 6
  • 15
  • Hello, thank you but that doesn't avoid the problem of the UTF-16 format because this is maked to encode any kind of passwords or texts, and if someone puts an "°" or and "¬" we will get an different value. – Ulkra Feb 11 '18 at 04:56
  • It is just a slight modification to the original code. I edited the answer to correct for this. – Daniel Feb 11 '18 at 05:15
0

If your characters are ASCII, you can use the method in Daniel Day’s answer.

If they are in UTF-8 format or the other local multi-byte string encoding (if you’re using some old compiler where it’s different), you can convert to char16_t[] with the mbrtoc16()function from <uchar.h>, and then from char16_t[] to uint16_t[] or int[]. Make sure the endianness is the same. I would strongly recommend that you use UTF-8 encoding whenever you can. In fact, you might find it simpler to pass along a UTF-8 string and convert from UTF-8 in Java.

If the string is in some other encoding, you need to use some other library to perform the conversion, such as ICU. C does not, in fact, specify that the default encoding is “ANSI” (that is, Windows Code Page 1252) and there is really no reason to store new data in that legacy format.

Note that an int is typically 32 bits wide, but can be some other size, while a Java Char is 16-bits wide. You might want instead to pass a format such as uint16_t[] from <stdint.h>, which is exactly the right size, or char16_t[] from <uchar.h>.

Davislor
  • 14,674
  • 2
  • 34
  • 49