0

Let assume I have a function like

template<typename charT>
void fun(std::basic_ostream<charT>&  out, std::basic_fstream<charT>& file)
{
    std::basic_string<charT> str;

    file>>str;
    out<<str;
}

Note: file is encoded as utf-8

I am not knowledgeable with Unicode. Can I use this function for both ASCII and Unicode, or build a class using basic_type so that class class can be use for both Unicode and ASCII.

My question is there is any difference between ASCII and Unicode at processing level?

Edit:

processing level means doing operation on that strings like append, print and take string from file.

Why i am asking that question is std::string and std::wstring are typedef ed version of basic_string having char and wchar_t

and std::cout and std::wcout are typedef ed version of std::basic_ostream having char and wchar_t but both code are same.

in both cases difference is only memory.

so i build a class using basic_type so that class can be used for both ASCII and Unicode.

srilakshmikanthanp
  • 2,231
  • 1
  • 8
  • 25
  • 2
    Yes. ASCII and the unicode encodings are different. I don't know what you mean by "processing level". – eerorika Jul 02 '20 at 14:46
  • There are several tens of thousands of Unicode characters that can be encoded as UTF-8 but not as ASCII. What exactly do you expect to happen if your file contains even one of those? – MSalters Jul 02 '20 at 14:49
  • @eerorika i mean that process on that string(like append) or print it or take a string from file – srilakshmikanthanp Jul 02 '20 at 14:49
  • Isn't the issue here the interpretation of whitespace characters? operator>> depends on knowing what a whitespace character is and some unicode whitespace characters have multi-byte UTF-8 encodings. – john Jul 02 '20 at 14:53
  • C++ doesn't really have support for unicode. If you want to use unicode, I'd suggest a library like [ICU](http://site.icu-project.org/) – NathanOliver Jul 02 '20 at 14:53
  • "*in both cases difference is only memory*" - that is NOT the only difference. The byte encoding of the data stored inside that memory is different. The way the data needs to be processed is different. The way the data is output to a console or a file is different. Everything about `char` data vs `wchar_t` data is different. – Remy Lebeau Jul 02 '20 at 16:42

1 Answers1

1

There is any difference between ascii and unicode

Yes. They are distinct encodings and not identical. As such, there are differences.

Can I use this function for both ASCII and Unicode

Yes. For UTF-8 (assuming size of byte is 8 bits).

The function does nothing that would require different handling between those encodings.

Although, if you want to read the output from a terminal, it depends on the capabilities and cofiguration of the terminal which encoding it uses to show the output. If it doesn't match what you're printing, then the output may be misinterpreted.

eerorika
  • 232,697
  • 12
  • 197
  • 326