I am a developer of a library and our old code uses sscanf()
and sprintf()
to read/write a variety of internal types from/to strings. We have had issues with users who used our library and had a locale that was different from the one we based our XML files on ("C" locale). In our case this resulted in incorrect values parsed from those XML files and those submitted as strings in run-time. The locale may be changed by a user directly but can also be changed without the knowledge of the user. This can happen if the locale-changes occurs inside another library, such as GTK, which was the "perpetrator" in one bug report. Therefore, we obviously want to remove any dependency from the locale to permanently free ourselves from these issues.
I have already read other questions and answers in the context of float/double/int/... especially if they are separated by a character or located inside brackets, but so far the proposed solutions I found were not satisfying to us. Our requirements are:
No dependencies on libraries other than the standard library. Using anything from boost is therefore, for example, not an option.
Must be thread-safe. This is meant in specific regarding the locale, which can be changed globally. This is really awful for us, as therefore a thread of our library can be affected by another thread in the user's program, which may also be running code of a completely different library. Anything affected by
setlocale()
directly is therefore not an option. Also, setting the locale before starting to read/write and setting it back to the original value thereafter is not a solution due to race conditions in threads.While efficiency is not the topmost priority (#1 & #2 are), it is still definitely of our concern, as strings may be read and written in run-time quite frequently, depending on the user's program. The faster, the better.
Edit: As an additional note: boost::lexical_cast
is not guaranteed to be unaffected by the locale (source: Locale invariant guarantee of boost::lexical_cast<>). So that would not be a solution even without requirement #1.
I gathered the following information so far:
- First of all, what I saw being suggested a lot is using boost's lexical_cast but unfortunately this is not an option for us as at all, as we can't require all users to also link to boost (and because of the lacking locale-safety, see above). I looked at the code to see if we can extract anything from it but I found it difficult to understand and too large in length, and most likely the big performance-gainers are using locale-dependent functions anyways.
- Many functions introduced in C++11, such as
std::to_string
,std::stod
,std::stof
, etc. depend on the global locale just the way sscanf and sprintf do, which is extremely unfortunate and to me not understandable, considering that std::thread has been added. std::stringstream
seems to be a solution in general, since it is thread-safe in the context of the locale, but also in general if guarded right. However, if it is constructed freshly every time it can be slow (good comparison: http://www.boost.org/doc/libs/1_55_0/doc/html/boost_lexical_cast/performance.html). I assume this can be solved by having one such stream per thread configured and available, clearing it each time after usage. However, a problem is that it doesn't solve formats as easily assscanf()
does, for example:" { %g , %g } "
.
sscanf()
patterns that we, for example, need to be able to read are:
" { %g , %g }"
" { { %g , %g } , { %g , %g } }"
" { top: { %g , %g } , left: { %g , %g } , bottom: { %g , %g } , right: { %g , %g }"
Writing these with stringstreams seems no big deal, but reading them seems problematic, especially considering the whitespaces.
Should we use std::regex
in this context or is this overkill? Are stringstreams a good solution for this task or is there any better way to do this given the mentioned requirements? Also, are there any other problems in the context of thread-safety and locales that I have not considered in my question - especially regarding the usage of std::stringstream?