I am making a library which lets user insert and search key-value pairs as trie data structure. When I insert a unicode string, it breaks down into 4 characters(utf-8)(which is okay), but each character becomes ‘?’. So I tried using setlocale(LC_ALL, "")
which didn’t work (or maybe I just dunno what are the right arguments for my case and where to call it). I don’t really care about printing or reading the character as it is. All I want is that it can somehow be represented uniquely.
In my trie there are links like node *next[256]
.
So all I want is when a unicode string gets inserted, it gets inserted as a unique combination which would make it possible to search that string uniquely. Also I want a way to detect that a unicode character was broken down into 4 individual chars. Thats because, e.g., If in a string “wxyz" a unicode character “x” is broken down into a,b,c,d then trie would store “wabcdyz”. But if I was actually searching a string wabcdyz(not unicode), then it would find the entry for that string but that would be a mismatch.
Here is a program that shows the unicode character being broken down into four ?
characters:
#include <stdio.h>
int main()
{
printf("Hello World");
char a[] = "Ƃ";
int i;
for(i = 0 ; a[i] != '\0' ; ++i)
{
printf("%c", a[i]);
}
return 0;
}