0

how do i replace each occurrence of a specific ascii character in a std::string with a unicode character?

im trying (using em dash as an example)

string mystring;
replace(mystring.begin(), mystring.end(), ' ', '—'); // error: 2nd char is too wide for char
replace(mystring.begin(), mystring.end(), " ", "—"); // error: replace() does not exist

i could of course write a loop, but i was hoping for there to be a single standard function available for this. im aware that the modified string will be longer that the original string.

seems like a silly basic problem, but 1 hour of googling solved zilch.

mrchance
  • 1,133
  • 8
  • 24
  • Natively, C++ doesn't provide any tools for processing Unicode strings. It deals with strings on a byte-by-byte basis, which only really works for ascii. You need some Unicode library. – BoBTFish Jan 04 '21 at 14:33
  • `std::string` really doesn't support unicode. The underlying type is `char`, and `string` expect each element to be it's own distinct glyph, unlike unicode where multiple elements can be combined into a single glyph – NathanOliver Jan 04 '21 at 14:34
  • "with a unicode character" - with Unicode character of what length? You should know that `C++` has unicode string literlas and you most likely familiar with https://en.cppreference.com/w/cpp/locale, `codecvt` namely. So what's your problem? – user14063792468 Jan 04 '21 at 14:46
  • @BoBTFish thanks. i will be looking into if boost has one. – mrchance Jan 04 '21 at 14:55
  • Related: https://stackoverflow.com/a/27658515/4641116 – Eljay Jan 04 '21 at 15:14
  • Does this answer your question? [How do I Search/Find and Replace in a standard string?](https://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string) – Raymond Chen Jan 04 '21 at 18:30
  • @RaymondChen thx, yes it does. however, my question is narrower and the solution also more specific. i did not find the indicated answer even though i searched quite a while. – mrchance Jan 04 '21 at 19:56

2 Answers2

1

std::string only knows about arbitrary char elements, but not what those chars actually represent. It is your responsibility to decide what charset the std::string's content will be encoded as, and then encode the Unicode character in that same charset. For example, in UTF-8, (U+2014 EM DASH) is 3 chars: 0xE2 0x80 0x94, but in Windows-125x charsets it is only 1 char: 0x97.

You can use the std::string::find() method to find the index of the 1-char ASCII character, and then use the std::string::replace() method to substitute in the char-encoded Unicode character, eg:

string mystring = ...;
string replacement = ...; // "\xE2\x80\x94", "\x97", etc...
string::size_type pos = 0;
while ((pos = mystring.find(' ', pos)) != string::npos) {
    mystring.replace(pos, 1, replacement);
    pos += replacement.size();
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
0

boost, oh yeah, does the trick

#include <boost/algorithm/string/replace.hpp>
...
boost::replace_all(mystring, " ", "—");

https://www.boost.org/doc/libs/1_47_0/doc/html/boost/algorithm/ireplace_all.html

alternatively (although verbose) using only the standard library:

string tmp;
std::regex_replace(back_inserter(tmp), mystring.begin(), mystring.end(), std::regex(" "), "—");
mystring = tmp;
mrchance
  • 1,133
  • 8
  • 24