How to obtain the width of string output?

Question

Consider

#include <string>
#include <iostream>

int main()
{ 
    std::string test="αλφα";
    std::cout << "size() of '" << test << "' = " << test.size() << std::endl;
}

which produces

size() of 'αλφα' = 8

How can I with the C++ standard library find the width of the output that will be produced by writing a string (i.e. 4 in the example above)?

Does https://stackoverflow.com/a/18850689/5470596 answers your question? — YSC, Jan 24 '19 at 16:15
I'm not sure about the dup. OP might want a generic, not UTF-8-only answer. — YSC, Jan 24 '19 at 16:20
@YSC Agreed, nothing useful in the standard library, so roll your own simple decoder. Only the OP can tell us if their code is unicode utf-8 or a specific MBCS, but I would recommend using utf-8 if you have a choice as it is "everywhere" — Gem Taylor, Jan 24 '19 at 17:05

Bathsheba · Answer 1 · 2019-01-24T16:27:36.993

1

The problem here is related to the encoding associated with the string.

This looks like UTF-8 encoding to me (the first character is not the lower case 'a'). In that encoding, the characters you present take two bytes each which accounts for the answer.

UTF-8 encoding is broadly supported by the C++11 standard (rather elegantly UTF-8 doesn't have any zero bytes in any text stream cf. Windows Unicode) - you can use std::string although the lengths will, in general, be understated - but care must be taken when creating string literals of that type directly in your editor.

More reading from here: How to use Unicode (UTF-8) in C++

edited Jan 24 '19 at 16:27

answered Jan 24 '19 at 16:16

Bathsheba

231,907
34
361
483

1

On windows, `std::wstring` can help. See https://stackoverflow.com/questions/402283/stdwstring-vs-stdstring – YSC Jan 24 '19 at 16:17
When you say it is "broadly supported" do you mean there is a wide range of support, or that C++ acknowledges it exits can can work with it (kind of)? – NathanOliver Jan 24 '19 at 16:18
@YSC: Indeed it can, although I'd recommend, on balance, using UTF-8. – Bathsheba Jan 24 '19 at 16:19
I think its' UTF-8 -- there is no lower case 'a', but two 'α' (greek alpha). Also, this is on a MAC, definitely no micro software. – Walter Jan 24 '19 at 16:24
I'm not sure why the UTF-8 representation for the string here would be 7 bytes large, as I only see small greek alpha characters, and no lower case 'a'. Here, I obtained: "αλφα" -> "ce b1 ce bb cf 86 ce b1" – SirDarius Jan 24 '19 at 16:24
@SirDarius: Oops yes you are correct, I've changed my opinion. – Bathsheba Jan 24 '19 at 16:27
For the sake of comparing apples to apples, what Microsoft calls "Unicode" is either UTF-16 or UCS-2 depending on the age and quality of the software you're considering ;) – Quentin Jan 24 '19 at 18:01
1

@Quentin: Either way it's full of damn NUL characters! – Bathsheba Jan 24 '19 at 18:01

How to obtain the width of string output?

1 Answers1