982

I want to convert a std::string to lowercase. I am aware of the function tolower(). However, in the past I have had issues with this function and it is hardly ideal anyway as using it with a std::string would require iterating over each character.

Is there an alternative which works 100% of the time?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Konrad
  • 39,751
  • 32
  • 78
  • 114
  • 51
    How else would you convert each element of a list of anything to something else, without iterating through the list? A string is just a list of characters, if you need to apply some function to each character, your going to have to iterate through the string. No way around that. –  Nov 24 '08 at 12:14
  • 29
    Why exactly does this question mert down rating? I don't have a problem with iterating through my string, but I am asking if there are other functions apart from tolower(), toupper() etc. – Konrad Nov 24 '08 at 12:24
  • 3
    If you have a C style char array, then I guess you may be able to add ox20202020 to each block of 4 characters (provided they are ALL already uppercase) to convert 4 characters to lowercase at a time. –  Nov 24 '08 at 13:05
  • 16
    @Dan: If they might already be lowercase, but are definitely A-Z or a-z, you can OR with 0x20 instead of adding. One of those so-smart-it's-probably-dumb optimisations that are almost never worth it... – Steve Jessop Nov 24 '08 at 13:11
  • 6
    I don't know why it would've been down-voted... certainly it's worded a little oddly (because you do have to iterate through every item somehow), but it's a valid question – warren Nov 24 '08 at 13:19
  • When I type questions I just tend to dump what is in my mental buffer at the time. It doesn't always make sense. ;) – Konrad Nov 24 '08 at 17:40
  • @onebyone: Ah, never thought of that! Well, I never really meant this was a useful way of doing it, just that it's possible. Actually, I'd be more interested int rying soemthing like that on large texts on a GPU, just for a laugh. –  Nov 26 '08 at 12:41
  • 1
    This is a good question. Most scripting languages handle it just the way you would expect it to be handled. – Eric Walker Nov 01 '09 at 22:11
  • Note that the answer you selected potentially has *undefined behaviour*. Despite all the up-votes, it is unsafe. – juanchopanza May 29 '14 at 18:05
  • 2
    I think what is meant by "iterating over each character" is "explicitly iterating over each character", such as to reduce code bloat, or verbose code. – Kit10 Jan 28 '15 at 17:18
  • 4
    Note: `tolower()` doesn't work 100% of the time. Lowercase/uppercase operations only apply to characters, and std::string is essentially an array of bytes, not characters. Plain `tolower` is nice for ASCII string, but it will not lowercase a latin-1 or utf-8 string correctly. You must know string's encoding and probably decode it before you can lowercase its characters. – Constantin Nov 24 '08 at 14:42
  • After reading through all these answers and back-and-forth comments, I'm not so certain that this is something you'd want to directly deal with inside your program. You may want to use a standalone module that takes strings and encoding/locale arguments and gives only a good result if it can be verifiably converted, which seems to require using the ICU library for maximum robustness. Alternatively, you can always play it even safer and remove the requirement for using case-checks as verification unless the app's entire point is getting those letters to lower-case. – kayleeFrye_onDeck May 03 '17 at 22:57
  • DevSolar gives an excellent answer which contains a very good example of why this can't be solved as a pure software exercise. He seems to agree as well as disagree with me on this and apparently won't include that you must be aware of cultural changes for any solution to work. It cannot be solved perfectly for all time in all cases. – Clearer Nov 07 '17 at 13:28
  • I would not expect in an object-oriented language to be forced to dig into the object to manipulate its inner elements. When I call std::string.clear() I don't have to cycle through inner elements and clear one of them at a time. – Demis Palma ツ Jun 25 '21 at 13:38

31 Answers31

1100

Adapted from Not So Frequently Asked Questions:

#include <algorithm>
#include <cctype>
#include <string>

std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
    [](unsigned char c){ return std::tolower(c); });

You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.

If you really hate tolower(), here's a specialized ASCII-only alternative that I don't recommend you use:

char asciitolower(char in) {
    if (in <= 'Z' && in >= 'A')
        return in - ('Z' - 'z');
    return in;
}

std::transform(data.begin(), data.end(), data.begin(), asciitolower);

Be aware that tolower() can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Stefan Mai
  • 23,367
  • 6
  • 55
  • 61
  • 6
    That is amazing, ive always wondered what the best way to do it. I had no idea to use std::transform. :) – UberJumper Nov 24 '08 at 13:40
  • uberjumper: There's actually a whole lot of overhead associated with the STL calls, especially for small"ish" strings. Solutions using a for loop and tolower are probably much faster. – Stefan Mai Nov 25 '08 at 00:54
  • 30
    (Old it may be, the algorithms in question have changed little) @Stefan Mai: What kind of "whole lot of overhead" is there in calling STL algorithms? The functions are rather lean (i.e. simple for loops) and often inlined as you rarely have many calls to the same function with the same template parameters in the same compile unit. – eq- Nov 11 '11 at 22:14
  • 3
    @eq Fair point, my benchmarks agree with you when compiling with `-O3` (though the STL actually outperforms the more hand-tuned code so I'm wondering whether the compiler is pulling some tricks). Debugging STL code is still a bear though ;). – Stefan Mai Nov 11 '11 at 23:00
  • 2
    This non portable solution could be faster. You can avoid branch it this way: inChar |= 0x20. I think it is the fastest way to convert ascii upper to lower. If u want to convert lower to upper then: inChar &= ~0x20. – Michal W Jan 31 '14 at 11:06
  • 3
    @MichalW This works if you have only letters, which isn't always the case. If you're in that realm, you can probably do even better by using bitmasks on longs -- take on 8 characters at a time ;) – Stefan Mai Feb 01 '14 at 07:20
  • 347
    Every time you assume characters are ASCII, God kills a kitten. :( – Rag Feb 10 '14 at 20:49
  • 15
    Your first example potentially has *undefined behaviour* (passing `char` to `::tolower(int)`.) You need to ensure you don't pass a negative value. – juanchopanza May 29 '14 at 17:30
  • 3
    While this should would be the canonical way to do this in a sane world, it has too many problems to recommend it. First, tolower from ctype.h doesn't work with unicode. Secondly, locale.h which is included by many of the other std library headers, defines a conflicting tolower, that causes headaches, see http://stackoverflow.com/q/5539249/339595. It is best to use std::locale or boost::locale::to_lower as other answers suggest. – pavon Jul 01 '14 at 17:14
  • 2
    ::towlower if you're being international/using wide chars – NathanTempelman Apr 15 '16 at 00:02
  • 3
    @MichalW Hey, can you explain what you wrote there? Also, why do we use `::` in `::tolower` ? – BugShotGG Apr 15 '16 at 13:40
  • 2
    @StefanMai Hi. Why is the "::" needed before "tolower"? I don't understand that. – Luis Paulo May 16 '16 at 01:13
  • Note that this works for Unicode if you're using a `std::u32string` and your C locale is compatible with Unicode. – Dan Jun 19 '16 at 09:13
  • 9
    The :: is needed before tolower to indicate that it is in the outermost namespace. If you use this code in another namespace, there may be a different (possibly unrelated) definition of tolower which would end up being preferentially selected without the ::. – Charles Ofria Jul 30 '16 at 16:43
  • 3
    ```std::transform(data.begin(), data.end(), data.begin(), easytolower);``` is dangerous. Since the behavior of ```std::tolower ``` undefined if the input is not representable as ```unsigned char``` and is not equal to ```EOF``` – 8.8.8.8 Aug 09 '17 at 05:52
  • @BrianGordon - But its much easier, and there really are way too many cats in the world already. – T.E.D. Nov 15 '17 at 13:39
  • 1
    @BrianGordon That is blatantly false, as proven by the fact that there are still kittens in the world! =) – Cort Ammon Dec 12 '17 at 21:40
  • 1
    What makes the 2nd solution non-portable? Can I just do this? https://pastebin.com/MPRMpQJS – TypicalHog Mar 24 '18 at 23:12
  • @BrianGordon there are also cases when you _know_ that the input is ASCII (e.g. the wire format of domain names). – Alnitak May 17 '18 at 13:54
  • @Alnitak I didn't know that. How does DNS handle international domain names which can be in unicode? – Rag May 24 '18 at 04:57
  • @BrianGordon applications have to convert them into an all-ASCII encoding called "Punycode" (RFC 3492) – Alnitak May 24 '18 at 07:41
  • 2
    @TypicalHog: Because there is no guarantee that `'A'` to `'Z'` is a continuous range (EBCDIC); but more importantly because there *are* letters outside that range (`'Ü'`, `'á'`, ...). It's very, *very* sad that the authors prefer to harvest more upvotes for answers with non-portable solutions instead of properly pointing out their shortcomings... – DevSolar Oct 02 '18 at 23:08
  • @DevSolar: `easytolower` seems a perfectly valid solution for latin ASCII symbols to me. Going to use it for normalizing HTML tag names. – Violet Giraffe Oct 04 '18 at 07:52
  • @Cheersandhth.-Alf c99 doesn't mention that it's UB: it either returns lower char, or unmodified. `std::tolower`, however, mentions ub – Pavel P Jan 21 '19 at 22:44
  • 1
    @L.F. I fixed your fix. – Deduplicator Jul 06 '19 at 00:25
  • @Deduplicator To be honest, I have always been having trouble understanding why the `char` has to be converted `unsigned char` first. Isn't the value of a (signed) `char` supposed to be nonnegative, anyway? What is the point of `tolower`ing a negative `char`? I guess I am missing the point, so would you mind explaining it to be a little bit please :) – L. F. Jul 06 '19 at 00:32
  • 1
    @L.F. No, `char` can be analogous to `signed char`, and a `signed char` can be negative. `tolower` only accepts `unsigned char` and `-1`. Anything outside its domain is UB, and you don't want to conflate with `-1` either. While all members of the *basic execution character set* are non-negative, that does not necessarily hold for the (complete) *execution character set*. [See the current draft](http://eel.is/c++draft/lex.charset). – Deduplicator Jul 06 '19 at 00:40
  • @Deduplicator Thank you! I didn't know a `char` can validly be negative. But then, doesn't converting to `unsigned char` just change the value? – L. F. Jul 06 '19 at 00:41
  • @L.F. `char` -> `unsigned char` (value-preserving, modulo 2**CHAR_BIT) -> implicit to `int` (value-preserving). Of course, if `sizeof(int) == 1`, things pretty much fall apart. – Deduplicator Jul 06 '19 at 00:44
  • @Deduplicator OK ... I think I missed that ... Then the `int` is converted to `char`, I think, so the resulting value is implementation-defined before C++20 and guaranteed to be the original value since C++20? – L. F. Jul 06 '19 at 00:47
  • @L.F. Converting the result from `tolower()` (`int`) back to `char` is also an interesting story, yes. – Deduplicator Jul 06 '19 at 00:51
  • 2
    I don't understand why the tolower here is wrapped in a lambda rather than just passing it to transform on its own. – JPhi1618 Oct 17 '19 at 19:39
  • 2
    @JPhi1618 1) to make sure that the character is first converted to `unsigned char` (see Deduplicator's comments above); 2) to enable overload resolution to select the [`int tolower( int ch );`](https://en.cppreference.com/w/cpp/string/byte/tolower) overload defined in `` instead of the [`template< class charT > charT tolower( charT ch, const locale& loc );`](https://en.cppreference.com/w/cpp/locale/tolower) overload defined in ``. – L. F. Feb 21 '20 at 02:30
  • *happily coding in Java and the time comes to switch over to a CPP module... comes along a simple string case issue* Me: "I'll just look up the std::string toLower() or whatever the standard has for normalizing text case... Hmm, I wonder how they handle all the encoding and localization complexities a 'simple' task like that could entail when std::string is just raw text data?" *finds this question... sad requiring that ingest data follows a case convention noises* – CCJ May 26 '20 at 20:46
  • I don't think you need to wrap std::tolower in a lambda. – JadeSpy Feb 05 '22 at 17:12
  • @ccj yeah, the distinct lack of "normal" library functions when I started doing C++ was quite disturbing – masher Aug 30 '22 at 04:14
  • @Cheersandhth.-Alf what is _"UB"_ in _"...it's UB for non-ASCII input."_? – Milan Apr 06 '23 at 21:05
  • 1
    @Milan: The answer has been edited in July 2019 to remove the original problem, by replacing `char` with `unsigned char`. For that original problem, cppreference notes about `std::tolower`: ❝If the value of ch is not representable as unsigned char and does not equal EOF, the behavior is undefined❞. And since most all C++ compilers have `char` as a signed type by default, any non-ASCII character is in practice encoded with one or more negative `char` values, which if used directly as argument to `std::tolower` will encounter the quoted UB. Conversion to `unsigned char` avoids that problem. – Cheers and hth. - Alf Apr 08 '23 at 19:17
  • @Cheersandhth.-Alf Thanks for your response. Out of curiosity, what is the full form of **'UB'**? – Milan Apr 11 '23 at 13:37
  • 1
    @Milan: Undefined Behavior. https://eel.is/c++draft/intro.defs#defns.undefined https://en.cppreference.com/w/cpp/language/ub – Cheers and hth. - Alf Apr 12 '23 at 18:19
372

Boost provides a string algorithm for this:

#include <boost/algorithm/string.hpp>

std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str

Or, for non-in-place:

#include <boost/algorithm/string.hpp>

const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Rob
  • 76,700
  • 56
  • 158
  • 197
  • 25
    Fails for non-ASCII-7. – DevSolar Feb 27 '15 at 09:28
  • 3
    This is pretty slow, see this benchmark: godbolt.org/z/neM5jsva1 – prehistoricpenguin Jun 29 '21 at 10:31
  • 3
    @prehistoricpenguin slow? Well, slow is to debug code because your own implementation has a bug because it was more complicated than to just call the boost library ;) If the code is critical, like called a lot and provides a bottleneck, then, well, it can be worth to think about slowness – Mayou36 Feb 12 '22 at 12:00
  • I believe boost isn't C++ standard library solution, isn't it? – ZoomIn Oct 13 '22 at 11:00
  • 2
    No, it isn't. It's one of these extremely unfortunate answers you see on EVERY SINGLE C++ question on this website... because adding an entire library just to do something so simple is apparently the most popular route! – Logix Jan 27 '23 at 16:42
  • Unfortunately if you know Unicode you know that you need a library to do it correctly. But this doesn't mean boost is the one, because it also requires ICU. Welcome to transitive dependency monsters (and ICU has very unstable ABI to make it worse). – Lothar Feb 14 '23 at 23:34
  • I find this answer helpful as I already have Boost in my project, and I do need the `non-in-place` version to_lower – konchy Jun 02 '23 at 01:10
  • Not everyone uses Boost. – Craig B Aug 31 '23 at 18:12
338

tl;dr

Use the ICU library. If you don't, your conversion routine will break silently on cases you are probably not even aware of existing.


First you have to answer a question: What is the encoding of your std::string? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f?)

If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string as container, you are already deceiving yourself if you believe you are still in control of things. You are storing a multibyte character sequence in a container that is not aware of the multibyte concept, and neither are most of the operations you can perform on it! Even something as simple as .substr() could result in invalid (sub-) strings because you split in the middle of a multibyte sequence.

As soon as you try something like std::toupper( 'ß' ), or std::tolower( 'Σ' ) in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß into SS as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ is in the middle of a word (where σ would be correct), or at the end (ς). Another example would be std::tolower( 'I' ), which should yield different results depending on the locale -- virtually everywhere you would expect i, but in Turkey ı (LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).

So, any case conversion that works on a character at a time, or worse, a byte at a time, is broken by design. This includes all the std:: variants in existence at this time.

Then there is the point that the standard library, for what it is capable of doing, is depending on which locales are supported on the machine your software is running on... and what do you do if your target locale is among the not supported on your client's machine?

So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not any of the std::basic_string<> variants.

(C++11 note: std::u16string and std::u32string are better, but still not perfect. C++20 brought std::u8string, but all these do is specify the encoding. In many other respects they still remain ignorant of Unicode mechanics, like normalization, collation, ...)

While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.

And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows that include ICU, so you'd have to supply them together with your application, and that opens a whole new can of worms...)

So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:

#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>

#include <iostream>

int main()
{
    /*                          "Odysseus" */
    char const * someString = u8"ΟΔΥΣΣΕΥΣ";
    icu::UnicodeString someUString( someString, "UTF-8" );
    // Setting the locale explicitly here for completeness.
    // Usually you would use the user-specified system locale,
    // which *does* make a difference (see ı vs. i above).
    std::cout << someUString.toLower( "el_GR" ) << "\n";
    std::cout << someUString.toUpper( "el_GR" ) << "\n";
    return 0;
}

Compile (with G++ in this example):

g++ -Wall example.cpp -licuuc -licuio

This gives:

ὀδυσσεύς

Note that the Σ<->σ conversion in the middle of the word, and the Σ<->ς conversion at the end of the word. No <algorithm>-based solution can give you that.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • 32
    This is the correct answer in the general case. The standard gives nothing for handling anything except "ASCII" except lies and deception. It makes you *think* you can maybe deal with maybe UTF-16, but you can't. As this answer says, you cannot get the proper character-length (not byte-length) of a UTF-16 string without doing your own unicode handling. If you have to deal with real text, use ICU. Thanks, @DevSolar – lmat - Reinstate Monica Mar 25 '15 at 14:00
  • Is ICU available by default on Ubuntu/Windows or needs to be install separately? Also how about this answer:http://stackoverflow.com/a/35075839/207661? – Shital Shah May 11 '16 at 19:00
  • icu::UnicodeString::length() is technically also lying to you (although less frequently), as it reports the number of 16bit code units rather than the number of code points. ;-) – masaers Jun 15 '17 at 02:17
  • @masaers: To be completely fair, with things like combining characters, zero-width joiners and right-to-left markers, the number of code points is rather meaningless. I will remove that remark. – DevSolar Jun 15 '17 at 05:26
  • 2
    @DevSolar Agreed! The concept of length is rather meaningless on text (we could add ligatures to the list of offenders). That said, since people are used to tabs and control chars taking up one length unit, code points would be the more intuitive measure. Oh, and thanks for giving the correct answer, sad to see it so far down :-( – masaers Jun 15 '17 at 06:51
  • Actually, `std::string` not being aware that it contains text in a multi-byte character-encoding is a feature, not a bug. It's the only sane way to do it, which is why just about everyone does it. Not having proper standard apis for handling anything but basic text from days gone by which never really were at all is a problem though, yes. It would have to be optional even in a hosted environment though, as it is quite hefty, and there are many cases where it isn't needed. – Deduplicator Dec 15 '20 at 00:49
  • @Deduplicator: Sorry, but that's just dodging it in all possible ways. There *are* standards (Unicode), there *are* quasi-standard APIs for handling it (ICU), and if your intention is to write code that properly converts text to lowercase, unless you can *guarantee* your code will only ever see ASCII-7 (which would be a rather special case), all the other "solutions" here are 80--20 at best. – DevSolar Dec 15 '20 at 07:37
  • That is why there should be such standard APIs. Doesn't negate the fact that much string-manipulation is best done ignoring all but it being a sequence of code-units. And that many use-cases never need anything more sophisticated. – Deduplicator Dec 15 '20 at 11:30
  • @Deduplicator And that standard API is currently the ICU library, which is what this answer is about. – DevSolar Dec 15 '20 at 11:59
  • @Deduplicator I heard that `std::text` is underway, perhaps even in time for C++23. Let's not give up all hope yet. – DevSolar Mar 02 '21 at 15:42
  • `icu::UnicodeString` seem to be a good class. QString also can do the job. However it is a pain to use in big programs with many libraries. I hope `std::text` will be a real thing soon – Kiruahxh Jun 16 '22 at 09:57
39

Using range-based for loop of C++11 a simpler code would be :

#include <iostream>       // std::cout
#include <string>         // std::string
#include <locale>         // std::locale, std::tolower

int main ()
{
  std::locale loc;
  std::string str="Test String.\n";

 for(auto elem : str)
    std::cout << std::tolower(elem,loc);
}
incises
  • 1,045
  • 9
  • 7
  • 10
    However, on a french machine, this program doesn't convert non ASCII characters allowed in the french language. For instance a string 'Test String123. É Ï\n' will be converted to : 'test string123. É Ï\n' although characters É Ï and their lower case couterparts 'é' and 'ï', are allowed in french. It seems that no solution for that was provided by other messages of this thread. – incises Oct 09 '13 at 08:15
  • 2
    I think you need to set a proper locale for that. – user1095108 Dec 30 '13 at 08:37
  • 1
    @incises, this then someone posted an answer about ICU and that's certainly the way to go. Easier than most other solutions that would attempt to understand the locale. – Alexis Wilke Sep 01 '16 at 21:25
  • I'd prefer to not use external libraries when possible, personally. – kayleeFrye_onDeck Jul 11 '17 at 00:54
33

Another approach using range based for loop with reference variable

string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

cout<<test<<endl;
Gilson PJ
  • 3,443
  • 3
  • 32
  • 53
32

If the string contains UTF-8 characters outside of the ASCII range, then boost::algorithm::to_lower will not convert those. Better use boost::locale::to_lower when UTF-8 is involved. See http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/conversions.html

Patrick Ohly
  • 712
  • 6
  • 8
27

This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.

#include <string>
#include <algorithm>
#include <iostream>

int main (int argc, char* argv[])
{
  std::string sourceString = "Abc";
  std::string destinationString;

  // Allocate the destination space
  destinationString.resize(sourceString.size());

  // Convert the source string to lower case
  // storing the result in destination string
  std::transform(sourceString.begin(),
                 sourceString.end(),
                 destinationString.begin(),
                 ::tolower);

  // Output the result of the conversion
  std::cout << sourceString
            << " -> "
            << destinationString
            << std::endl;
}
user2218467
  • 287
  • 3
  • 3
9

Simplest way to convert string into loweercase without bothering about std namespace is as follows

1:string with/without spaces

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    getline(cin,str);
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}

2:string without spaces

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    cin>>str;
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}
Atul Rokade
  • 156
  • 1
  • 4
  • 1
    This is plain wrong: if you check the documentation, you will see that `std::tolower` cannot work with `char`, it only supports `unsigned char`. So this code is UB if `str` contains characters outside of 0x00-0x7F. – Dmitry Grigoryev Jan 31 '22 at 14:18
  • This is also false by virtue of using an identifier starting with `str` in the global namespace, which is strictly reserved. – Roflcopter4 Nov 03 '22 at 20:20
7

My own template functions which performs upper / lower case.

#include <string>
#include <algorithm>

//
//  Lowercases string
//
template <typename T>
std::basic_string<T> lowercase(const std::basic_string<T>& s)
{
    std::basic_string<T> s2 = s;
    std::transform(s2.begin(), s2.end(), s2.begin(),
        [](const T v){ return static_cast<T>(std::tolower(v)); });
    return s2;
}

//
// Uppercases string
//
template <typename T>
std::basic_string<T> uppercase(const std::basic_string<T>& s)
{
    std::basic_string<T> s2 = s;
    std::transform(s2.begin(), s2.end(), s2.begin(),
        [](const T v){ return static_cast<T>(std::toupper(v)); });
    return s2;
}
Benjamin Buch
  • 4,752
  • 7
  • 28
  • 51
TarmoPikaro
  • 4,723
  • 2
  • 50
  • 62
6

I wrote this simple helper function:

#include <locale> // tolower

string to_lower(string s) {        
    for(char &c : s)
        c = tolower(c);
    return s;
}

Usage:

string s = "TEST";
cout << to_lower("HELLO WORLD"); // output: "hello word"
cout << to_lower(s); // won't change the original variable.
A-Sharabiani
  • 17,750
  • 17
  • 113
  • 128
5

std::ctype::tolower() from the standard C++ Localization library will correctly do this for you. Here is an example extracted from the tolower reference page

#include <locale>
#include <iostream>

int main () {
  std::locale::global(std::locale("en_US.utf8"));
  std::wcout.imbue(std::locale());
  std::wcout << "In US English UTF-8 locale:\n";
  auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
  std::wstring str = L"HELLo, wORLD!";
  std::wcout << "Lowercase form of the string '" << str << "' is ";
  f.tolower(&str[0], &str[0] + str.size());
  std::wcout << "'" << str << "'\n";
}
akim
  • 8,255
  • 3
  • 44
  • 60
Sameer
  • 2,435
  • 23
  • 13
  • Nice, as long as you can convert the characters in place. What if your source string is `const`? That seems to make it a bit more messy (e.g. it doesn't look like you can use `f.tolower()` ), since you need to put the characters in a new string. Would you use `transform()` and something like `std::bind1st( std::mem_fun() )` for the operator? – quazar Aug 17 '16 at 06:09
  • For a const string, we can just make a local copy and then convert it in place. – Sameer Aug 29 '16 at 14:53
  • Yeah, though, making a copy adds more overhead. – quazar Sep 04 '16 at 20:49
  • You could use std::transform with the version of ctype::tolower that does not take pointers. Use a back inserter iterator adapter and you don't even need to worry about pre-sizing your output string. – chili Apr 24 '17 at 02:11
  • Great, especially because in libstdc++'s `tolower` with `locale` parameter, the implicit call to `use_facet` appears to be a performance bottleneck. One of my coworkers has achieved a several 100% speed increase by replacing `boost::iequals` (which has this problem) with a version where `use_facet` is only called once outside of the loop. – Arne Vogel May 23 '17 at 12:23
  • This won't work in Windows where you'd have to call `std::locale("English_Unites States.UTF8")`. – Dmitry Grigoryev Jan 31 '22 at 14:23
4

An alternative to Boost is POCO (pocoproject.org).

POCO provides two variants:

  1. The first variant makes a copy without altering the original string.
  2. The second variant changes the original string in place.
    "In Place" versions always have "InPlace" in the name.

Both versions are demonstrated below:

#include "Poco/String.h"
using namespace Poco;

std::string hello("Stack Overflow!");

// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));

// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);
Roger Stewart
  • 1,105
  • 1
  • 15
  • 24
Jason Enochs
  • 1,436
  • 1
  • 13
  • 20
4

Since none of the answers mentioned the upcoming Ranges library, which is available in the standard library since C++20, and currently separately available on GitHub as range-v3, I would like to add a way to perform this conversion using it.

To modify the string in-place:

str |= action::transform([](unsigned char c){ return std::tolower(c); });

To generate a new string:

auto new_string = original_string
    | view::transform([](unsigned char c){ return std::tolower(c); });

(Don't forget to #include <cctype> and the required Ranges headers.)

Note: the use of unsigned char as the argument to the lambda is inspired by cppreference, which states:

Like all other functions from <cctype>, the behavior of std::tolower is undefined if the argument's value is neither representable as unsigned char nor equal to EOF. To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char:

char my_tolower(char ch)
{
    return static_cast<char>(std::tolower(static_cast<unsigned char>(ch)));
}

Similarly, they should not be directly used with standard algorithms when the iterator's value type is char or signed char. Instead, convert the value to unsigned char first:

std::string str_tolower(std::string s) {
    std::transform(s.begin(), s.end(), s.begin(), 
                // static_cast<int(*)(int)>(std::tolower)         // wrong
                // [](int c){ return std::tolower(c); }           // wrong
                // [](char c){ return std::tolower(c); }          // wrong
                   [](unsigned char c){ return std::tolower(c); } // correct
                  );
    return s;
}
L. F.
  • 19,445
  • 8
  • 48
  • 82
3

On microsoft platforms you can use the strlwr family of functions: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspx

// crt_strlwr.c
// compile with: /W3
// This program uses _strlwr and _strupr to create
// uppercase and lowercase copies of a mixed-case string.
#include <string.h>
#include <stdio.h>

int main( void )
{
   char string[100] = "The String to End All Strings!";
   char * copy1 = _strdup( string ); // make two copies
   char * copy2 = _strdup( string );

   _strlwr( copy1 ); // C4996
   _strupr( copy2 ); // C4996

   printf( "Mixed: %s\n", string );
   printf( "Lower: %s\n", copy1 );
   printf( "Upper: %s\n", copy2 );

   free( copy1 );
   free( copy2 );
}
Sandeep Datta
  • 28,607
  • 15
  • 70
  • 90
2

There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.

Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.

Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.

enter image description here

The code looks like this...

#include <clocale>
static char UtoL[256];
// ----------------------------------------------------------------------------
void InitUtoLMap()  {
    for (int i = 0; i < sizeof(UtoL); i++)  {
        if (isupper(i)) {
            UtoL[i] = (char)(i + 32);
        }   else    {
            UtoL[i] = i;
        }
    }
}
// ----------------------------------------------------------------------------
char *LowerStr(char *szMyStr) {
    char *p = szMyStr;
    // do conversion in-place so as not to require a destination buffer
    while (*p) {        // szMyStr must be null-terminated
        *p = UtoL[*p];  
        p++;
    }
    return szMyStr;
}
// ----------------------------------------------------------------------------
int main() {
    time_t start;
    char *Lowered, Upper[128];
    InitUtoLMap();
    strcpy(Upper, "Every GOOD boy does FINE!");

    Lowered = LowerStr(Upper);
    return 0;
}

This approach will, at the same time, allow you to remap any other characters you wish to change.

This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.

Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.

hroptatyr
  • 4,702
  • 1
  • 35
  • 38
user2548100
  • 4,571
  • 1
  • 18
  • 18
2

Here's a macro technique if you want something simple:

#define STRTOLOWER(x) std::transform (x.begin(), x.end(), x.begin(), ::tolower)
#define STRTOUPPER(x) std::transform (x.begin(), x.end(), x.begin(), ::toupper)
#define STRTOUCFIRST(x) std::transform (x.begin(), x.begin()+1, x.begin(),  ::toupper); std::transform (x.begin()+1, x.end(),   x.begin()+1,::tolower)

However, note that @AndreasSpindler's comment on this answer still is an important consideration, however, if you're working on something that isn't just ASCII characters.

Community
  • 1
  • 1
Volomike
  • 23,743
  • 21
  • 113
  • 209
  • 3
    I'm downvoting this for giving macros when a perfectly good solution exist -- you even give those solutions. – Clearer Nov 07 '17 at 07:44
  • 2
    The macro technique means less typing of code for something that one would commonly use a lot in programming. Why not use that? Otherwise, why have macros at all? – Volomike Nov 07 '17 at 08:02
  • 3
    Macros are a legacy from C that's being worked hard on to get rid of. If you want to reduce the amount of typing, use a function or a lambda. `void strtoupper(std::string& x) { std::transform (x.begin(), x.end(), x.begin(), ::toupper); }` – Clearer Nov 07 '17 at 12:11
  • 1
    @Clearer As I want to be a better coder, can you provide me any ANSI doc links where any ANSI C++ committees say something to the effect of, "We need to call a meeting to get rid of macros out of C++"? Or some other roadmap? – Volomike Nov 07 '17 at 20:47
  • 2
    No, I can't. Bjarne's stance on the topic has been made pretty clear on several occasions though. Besides, there are plenty of reasons to not use macros in C as well as C++. `x` could be a valid expression, that just happens to compile correctly but will give completely bogus results because of the macros. – Clearer Nov 08 '17 at 12:02
  • good macros! @Clearer macros help us so much... I expect they never get rid of it. – Aquarius Power Jul 24 '18 at 23:50
  • 4
    @AquariusPower I disagree. I have yet to see a macro that could not have been done better as a template or a lambda. – Clearer Jul 29 '18 at 16:11
2

Is there an alternative which works 100% of the time?

No

There are several questions you need to ask yourself before choosing a lowercasing method.

  1. How is the string encoded? plain ASCII? UTF-8? some form of extended ASCII legacy encoding?
  2. What do you mean by lower case anyway? Case mapping rules vary between languages! Do you want something that is localised to the users locale? do you want something that behaves consistently on all systems your software runs on? Do you just want to lowercase ASCII characters and pass through everything else?
  3. What libraries are available?

Once you have answers to those questions you can start looking for a soloution that fits your needs. There is no one size fits all that works for everyone everywhere!

plugwash
  • 9,724
  • 2
  • 38
  • 51
2

C++ doesn't have tolower or toupper methods implemented for std::string, but it is available for char. One can easily read each char of string, convert it into required case and put it back into string. A sample code without using any third party library:

#include<iostream>
    
int main(){
    std::string str = std::string("How ARe You");
    for(char &ch : str){
        ch = std::tolower(ch);
    }
    std::cout<<str<<std::endl;
    return 0;
}

For character based operation on string : For every character in string

Mahipal
  • 589
  • 4
  • 7
1
// tolower example (C++)
#include <iostream>       // std::cout
#include <string>         // std::string
#include <locale>         // std::locale, std::tolower

int main ()
{
  std::locale loc;
  std::string str="Test String.\n";
  for (std::string::size_type i=0; i<str.length(); ++i)
    std::cout << std::tolower(str[i],loc);
  return 0;
}

For more information: http://www.cplusplus.com/reference/locale/tolower/

9T9
  • 698
  • 2
  • 9
  • 22
1

An explanation of how this solution works:


string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

Explanation:

for(auto& c : test) is a range-based for loop of the kind
for ( range_declaration:range_expression)loop_statement:

  1. range_declaration: auto& c
    Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer.

  2. range_expression: test
    The range in this case are the characters of string test.

The characters of the string test are available as a reference inside the for loop through identifier c.

cigien
  • 57,834
  • 11
  • 73
  • 112
goulashsoup
  • 2,639
  • 2
  • 34
  • 60
  • I don't see the value of adding this as an answer, or as an edit to the linked answer for that matter. If someone needs an explanation of how the range-for loop works, there are multiple resources for that, e.g. https://stackoverflow.com/questions/35490236. For this question, I think this explanation is just noise - like adding an explanation of how iterators or standard algorithms work for the answers that use `std::transform`. – cigien Jun 23 '23 at 13:31
1

Try this function :)

string toLowerCase(string str) {

    int str_len = str.length();

    string final_str = "";

    for(int i=0; i<str_len; i++) {

        char character = str[i];

        if(character>=65 && character<=92) {

            final_str += (character+32);

        } else {

            final_str += character;

        }

    }

    return final_str;

}
Bu Saeed
  • 1,173
  • 1
  • 16
  • 27
1

Have a look at the excellent c++17 cpp-unicodelib (GitHub). It's single-file and header-only.


#include <exception>
#include <iostream>
#include <codecvt>

// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"

using namespace std;
using namespace unicode;

// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;

std::u32string  in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;

std::u32string  lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;

Output

Je suis là!
je suis là!
Edward Gaere
  • 1,092
  • 6
  • 11
0

Use fplus::to_lower_case() from fplus library.

Search to_lower_case in fplus API Search

Example:

fplus::to_lower_case(std::string("ABC")) == std::string("abc");
Waqar
  • 8,558
  • 4
  • 35
  • 43
uol3c
  • 559
  • 5
  • 9
0

Google's absl library has absl::AsciiStrToLower / absl::AsciiStrToUpper

DimanNe
  • 1,791
  • 3
  • 12
  • 19
0

Since you are using std::string, you are using c++. If using c++11 or higher, this doesn't need anything fancy. If words is vector<string>, then:

    for (auto & str : words) {
        for(auto & ch : str)
            ch = tolower(ch);
    }

Doesn't have strange exceptions. Might want to use w_char's but otherwise this should do it all in place.

Lewis Levin
  • 85
  • 11
0

For a different perspective, there is a very common use case which is to perform locale neutral case folding on Unicode strings. For this case, it is possible to get good case folding performance when you realize that the set of foldable characters is finite and relatively small (< 2000 Unicode code points). It happens to work very well with a generated perfect hash (guaranteed zero collisions) can be used to convert every input character to its lowercase equivalent.

With UTF-8, you do have to be conscientious of multi-byte characters and iterate accordingly. However, UTF-8 has fairly simple encoding rules that make this operation efficient.

For more details, including links to the relevant parts of the Unicode standard and a perfect hash generator, see my answer here, to the question How to achieve unicode-agnostic case insensitive comparison in C++.

Charlie Reitzel
  • 809
  • 8
  • 13
-1

Code Snippet

#include<bits/stdc++.h>
using namespace std;


int main ()
{
    ios::sync_with_stdio(false);

    string str="String Convert\n";

    for(int i=0; i<str.size(); i++)
    {
      str[i] = tolower(str[i]);
    }
    cout<<str<<endl;

    return 0;
}
rashedcs
  • 3,588
  • 2
  • 39
  • 40
-1

Add some optional libraries for ASCII string to_lower, both of which are production level and with micro-optimizations, which is expected to be faster than the existed answers here(TODO: add benchmark result).

Facebook's Folly:

void toLowerAscii(char* str, size_t length)

Google's Abseil:

void AsciiStrToLower(std::string* s);
prehistoricpenguin
  • 6,130
  • 3
  • 25
  • 42
-1

I wrote a templated version that works with any string :

#include <type_traits> // std::decay
#include <ctype.h>    // std::toupper & std::tolower


template <class T = void> struct farg_t { using type = T; };
template <template<typename ...> class T1, 
class T2> struct farg_t <T1<T2>> { using type = T2*; };
//---------------

template<class T, class T2 = 
typename std::decay< typename farg_t<T>::type >::type>
void ToUpper(T& str) { T2 t = &str[0]; 
for (; *t; ++t) *t = std::toupper(*t); }


template<class T, class T2 = typename std::decay< typename 
farg_t<T>::type >::type>
void Tolower(T& str) { T2 t = &str[0]; 
for (; *t; ++t) *t = std::tolower(*t); }

Tested with gcc compiler:

#include <iostream>
#include "upove_code.h"

int main()
{

    std::string str1 = "hEllo ";
    char str2 [] = "wOrld";

    ToUpper(str1);
    ToUpper(str2);
    std::cout << str1 << str2 << '\n'; 
    Tolower(str1);
    Tolower(str2);
    std::cout << str1 << str2 << '\n'; 
    return 0;
}

output:

>HELLO WORLD
>
>hello world
The Oathman
  • 125
  • 7
-2

use this code to change case of string in c++.

#include<bits/stdc++.h>

using namespace std;

int main(){
  string a = "sssAAAAAAaaaaDas";
  transform(a.begin(),a.end(),a.begin(),::tolower);
  cout<<a;
}

SHAYAK
  • 157
  • 2
  • 8
-3

This could be another simple version to convert uppercase to lowercase and vice versa. I used VS2017 community version to compile this source code.

#include <iostream>
#include <string>
using namespace std;

int main()
{
    std::string _input = "lowercasetouppercase";
#if 0
    // My idea is to use the ascii value to convert
    char upperA = 'A';
    char lowerA = 'a';

    cout << (int)upperA << endl; // ASCII value of 'A' -> 65
    cout << (int)lowerA << endl; // ASCII value of 'a' -> 97
    // 97-65 = 32; // Difference of ASCII value of upper and lower a
#endif // 0

    cout << "Input String = " << _input.c_str() << endl;
    for (int i = 0; i < _input.length(); ++i)
    {
        _input[i] -= 32; // To convert lower to upper
#if 0
        _input[i] += 32; // To convert upper to lower
#endif // 0
    }
    cout << "Output String = " << _input.c_str() << endl;

    return 0;
}

Note: if there are special characters then need to be handled using condition check.

Praveer Kumar
  • 912
  • 1
  • 12
  • 25