Chars can commonly hold 255 different values (1 byte), or in other words, just the ASCII table (it could use the extended table if you make it unsigned). For handling UTF-8 characters i would recommend using another type like wchar_t (if a wide character in your compiler means as an UTF-8), otherwise use char_32 if you're using C++11, or a library to deal with your data like ICU.
Edit
This example code explains how to deal with UTF-8 in C. Note that you have to make sure that wchar_t in your compiler can store an UTF-8.
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>
main() {
FILE *file=fopen("Testing.txt", "r, ccs=UTF-8");
wchar_t sentence[100000], ch=1;
int n=0;
char*loc = setlocale(LC_ALL, "");
printf("Locale set to: %s\n", loc);
if(file==NULL){
printf("Error processing file\n");
} else {
while((ch = fgetwc(file)) != 65535){
/* The end of file value may vary depending of the wchar_t!*/
/* wprintf(L"%lc", ch); */
sentence[n]=ch+1; /*Example modification*/
n++;
}
}
fclose(file);
file=fopen("Testing.txt", "w, ccs=UTF-8");
fputws(sentence, file);
wprintf(L"%ls", sentence);
fclose(file);
return 0;
}
- Your system locale
The char*loc = setlocale(LC_ALL, "");
will help you see your current system locale. Make sure is in UTF-8 if your using linux, if you're using windows then you'll have to stick to one language. This is not a problem if you don't want to print the characters.
- How to open the file
Firstly, I opened it for reading it as text file instead of reading it as binary file. Also I have to open the file using the UTF-8 formating (I think in linux it will be as your locale, so the ccs=UTF-8
won't be necessary). Even though in windows we're stuck with one language, the file still has to be read in UTF-8.
- Using compatible functions with the characters
For this we'll use the functions inside the wchar.h library (like wprintf and fgetwc). The problem with the other functions is that they are limited to the range of a char, giving the wrong value.
I used as an example this:
¿khñà?
hello
~”م‘iاk·¶;R0ثp9´ -پ‘“گAéI‚sہئzOU,HدلKŒ©َض†ُ ت6‘گA=…¢¢³qد4â9àr}hw OUجy.4a³M;£´`د$r(q¸Œçً£F 6pG|ںJr(TîsشR
In the last part of the program It overwrites the file with the acumulated modified string.
You could try changing sentence[n]=ch+1;
to sentence[n]=ch;
to check in your original file if it reads and outputs the file correctly (and uncomment the wprintf to check the output).