29

I am from Python background and recently learning C++. I was learning a C/C++ function called memset and following the online example from website https://www.geeksforgeeks.org/memset-in-cpp/ where I got some compilation errors:

/**
 * @author      : Bhishan Poudel
 * @file        : a02_memset_geeks.cpp
 * @created     : Wednesday Jun 05, 2019 11:07:03 EDT
 * 
 * Ref: 
 */

#include <iostream>
#include <vector>
#include <cstring>

using namespace std;

int main(int argc, char *argv[]){
    char str[] = "geeksforgeeks";

    //memset(str, "t", sizeof(str));
    memset(str, 't', sizeof(str));

    cout << str << endl;

    return 0;
}

Error when using single quotes 't'
This prints extra characters.

tttttttttttttt!R@`

Error when using "t" with double quotes

$ g++ -std=c++11 a02_memset_geeks.cpp 
a02_memset_geeks.cpp:17:5: error: no matching function for call to 'memset'
    memset(str, "t", sizeof(str));
    ^~~~~~
/usr/include/string.h:74:7: note: candidate function not viable: no known
      conversion from 'const char [2]' to 'int' for 2nd argument
void    *memset(void *, int, size_t);
         ^
1 error generated.

How to use the memset in C++ ?

Further Study
Excellent tutorial with shortcomings of memset is given here: https://web.archive.org/web/20170702122030/https:/augias.org/paercebal/tech_doc/doc.en/cp.memset_is_evil.html

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169
  • 16
    `"t"` and `'t'` are not the same. – SergeyA Jun 05 '19 at 15:14
  • 16
    most online learning resources for c++ are crap and afaik that site is no exception, give this a try instead: https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list – 463035818_is_not_an_ai Jun 05 '19 at 15:17
  • I used single quote 't' but still get extra characters in the output. – BhishanPoudel Jun 05 '19 at 15:18
  • 2
    It is no longer a properly zero-terminated C string after you do this. You lost the 0. Consider passing sizeof(str)-1 instead. – Hans Passant Jun 05 '19 at 15:19
  • @HansPassant Then how to use it? Is it deprecated and not used nowadays? – BhishanPoudel Jun 05 '19 at 15:21
  • 8
    Why even use `memset` in C++? The reason old C functions exists is for backwards compability. – klutt Jun 05 '19 at 15:23
  • 13
    It is a loaded gun, you aimed it at your left foot and pulled the trigger. You have to aim right. – Hans Passant Jun 05 '19 at 15:23
  • 1
    this is quite relevant: https://stackoverflow.com/a/8590379/4117728 – 463035818_is_not_an_ai Jun 05 '19 at 15:23
  • Also, use `std::string` instead of `char[]` – klutt Jun 05 '19 at 15:24
  • 3
    @HansPassant So he should shoot his right foot then? ;) – dbush Jun 05 '19 at 15:26
  • @dbush: Or possibly worse... – Lightness Races in Orbit Jun 05 '19 at 15:27
  • 8
    You should not change question underneath people who are answering it. If you take a comment or answer in and it is still not working, you can ask another question, but this sort of editing, which replaces once question with another, is destructive – SergeyA Jun 05 '19 at 15:31
  • 3
    Don't use `std::memset`, use [std::fill](https://en.cppreference.com/w/cpp/algorithm/fill), its just as fast and safer. – Galik Jun 05 '19 at 16:04
  • Questions refering to given answers or containing an edit with something that belongs into an answer always look a bit odd imho. You can answer your can question, though – 463035818_is_not_an_ai Jun 05 '19 at 16:21
  • 4
    If you look closely at the page you cited and count a bit, you'll notice that `"geeksforgeeks"` has 13 characters, and that row of t's that represents the output has 14. So the example code produces extra output, too. As you can see from the answers, that's not unexpected -- the code is simply wrong. – Pete Becker Jun 06 '19 at 04:11
  • 1
    You really ought to avoid `using namespace std` - it is a bad habit to get into, and [can silently change the meaning of your program](/q/1452721) when you're not expecting it. Get used to using the namespace prefix (`std` is intentionally very short), or importing *just the names you need* into the *smallest reasonable scope*. Is it really so hard to write `std::memset`? – Toby Speight Jun 06 '19 at 08:10
  • @HansPassant Is shooting at the right foot any better? =P – Shamtam Jun 06 '19 at 12:47
  • 1
    @SergeyA I am not sure why this question is put on hold, it has complete MWE, addresses the question correctly and does not have vulgar words or anything bad comments. – BhishanPoudel Jun 07 '19 at 14:47
  • @astro123 original question had `"t"` in double quotes, and I voted to close it as a typo (clearly memset doesn't accept pointers as it's second argument). And than you edited the question, completely changing it's meaning - not it became a valid answerable question, but doing so, you invalidate previous answers, which is actually not that great. – SergeyA Jun 07 '19 at 15:26
  • IMO the only correct answer to this question is "You don't" - at least not while you are learning C++. Mybe when you are an expert. And even then, probably not. – Wouter van Ooijen Jun 11 '19 at 17:50

4 Answers4

69

This declaration

char str[] = "geeksforgeeks";

declares a character array that contains a string that is a sequence of characters including the terminating zero symbol '\0'.

You can imagine the declaration the following equivalent way

char str[] = 
{ 
    'g', 'e', 'e', 'k', 's', 'f', 'o', 'r', 'g', 'e', 'e', 'k', 's', '\0'
};

This call of the function memset

memset(str, 't', sizeof(str));

overrides all characters of the array including the terminating zero.

So the next statement

cout << str << endl;

results in undefined behavior because it outputs characters until the terminating zero is encountered.

You could write instead

#include <iostream>
#include <cstring>

int main()
{
    char str[] = "geeksforgeeks";

    std::memset( str, 't', sizeof( str ) - 1 );
    
    std::cout << str << '\n';
}

Or the following way

#include <iostream>
#include <cstring>

int main()
{
    char str[] = "geeksforgeeks";

    std::memset( str, 't', std::strlen( str ) );
    
    std::cout << str << '\n';
}

That is keeping the terminating zero unchanged in the array.

If you want to override all characters of the array including the terminating zero, then you should substitute this statement

std::cout << str << '\n';

for this statement

std::cout.write( str, sizeof( str ) ) << '\n';

as it is shown in the program below because the array now does not contain a string.

#include <iostream>
#include <cstring>

int main()
{
    char str[] = "geeksforgeeks";

    std::memset( str, 't', sizeof( str ) );
    
    std::cout.write( str, sizeof( str ) ) << '\n';
}

As for this call

memset(str, "t", sizeof(str));

then the type of the second argument (that is the type const char *) does not correspond to the type of the second function parameter that has the type int. See the declaration of the function

void * memset ( void * ptr, int value, size_t num );

Thus the compiler issues an error message.

Apart from character arrays (that are used very often even in C++) you can use also the standard class std::string (or std::basic_string) that simulates strings.

In this case there is no need to use the standard C function memset to fill a string with a single character. The simplest way to do this is the following

#include <iostream>
#include <string>

int main()
{
    std::string s( "geeksforgeeks" );
    
    s.assign( s.length(), 't' );
    
    std::cout << s << '\n';
}

Another way is to use the standard algorithm std::fill or std::fill_n declared in the header <algorithm>. For example

#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>

int main()
{
    std::string s( "geeksforgeeks" );
    
    std::fill( std::begin( s ), std::end( s ), 't' );
    
    std::cout << s << '\n';
}

or

#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>

int main()
{
    std::string s( "geeksforgeeks" );
    
    std::fill_n( std::begin( s ), s.length(), 't' );
    
    std::cout << s << '\n';
}

You even can use the method replace of the class std::string one of the following ways

#include <iostream>
#include <string>

int main()
{
    std::string s( "geeksforgeeks" );
    
    s.replace( 0, s.length(), s.length(), 't' );
    
    std::cout << s << '\n';
}

Or

#include <iostream>
#include <string>

int main()
{
    std::string s( "geeksforgeeks" );
    
    s.replace( std::begin( s ), std::end( s ), s.length(), 't' );
    
    std::cout << s << '\n';
}
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • 19
    The original post clearly indicates that the user is trying to learn C++. Please mention at least that none of this is relevant if you use `std::string`, which should be used here instead instead of using this complicated `C` stuff. (It might be relevant to know, though not at the beginning of a course) – JVApen Jun 05 '19 at 17:52
  • 8
    @JVApen The original post clearly indicates that the user is trying to know how to use memset with character arrays.:) – Vlad from Moscow Jun 06 '19 at 12:13
  • 1
    Good answer. If you want it to be better for the OP: note the difference in the type system. C++ has a static type system, where variables have a fixed static type. Python has a fully dynamic type system, where values have a type and variables do not. This is probably the source of his confusion involving `'t'` and `"t"`. – Yakk - Adam Nevraumont Jun 06 '19 at 15:45
  • What do you mean by "_simulates_ strings"? – Ruslan Jun 06 '19 at 21:30
  • @Ray You are mistaken. For starters a correct declaration will look like const char *str = "geeksforgeeks"; And in any case string literals in C and C++ are immutable. Any attempt to change a string literal results in undefined behaviour. – Vlad from Moscow Jun 07 '19 at 13:35
  • @VladfromMoscow You're right, of course. I wasn't paying close enough attention to exactly what was being done after the strlen call. I revise my suggestion to: You might want to also explain that if the string were declared as `const char *str = "geeksforgeeks";`, sizeof will no longer report the length of the string, but rather the size of the pointer. (Even if declaring it as a pointer to string literal in this particular example leads to further problems, I've seen enough people make the mistake of doing sizeof of a pointer to string that I think it's worth covering why that doesn't work.) – Ray Jun 07 '19 at 15:03
  • @Ray Thanks. But it'll be too broad answer to a simple question.:) – Vlad from Moscow Jun 07 '19 at 15:07
31

Error when using single quotes 't' This prints extra characters.

That's because you overwrote the null terminator.

The terminator is part of the array's size (an array is not magic), though it's not part of the logical string size.

So, I think you meant:

memset(str, 't', strlen(str));
//               ^^^^^^

Error when using "t" with double quotes

Completely different thing. You told the computer to set every character in the string, to a string. Doesn't make sense; won't compile.


How to use memset in C++?

Don't.

Either use the type-safe std::fill, in combination with std::begin and std::end:

std::fill(std::begin(str), std::end(str)-1, 't');

(If you're worried about performance, don't be: this will just delegate to memset where possible via template specialisation, optimisation not required, without sacrificing type-safety; example here in libstdc++.)

Or just a std::string to begin with.


I was learning the fuction memset in C++ from https://www.geeksforgeeks.org/memset-in-cpp/ where the example is given as below

Don't attempt to learn C++ from random websites. Get yourself a good book instead.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 4
    unfortunately it really is `sizeof` in the orginial example. A pity that such code is used to "teach" c++ :( – 463035818_is_not_an_ai Jun 05 '19 at 15:25
  • I am learning C++, and learning online from https://www.geeksforgeeks.org/memset-in-cpp/, The example is taken from there, nothing warnings were given there. Thanks for the usage info. – BhishanPoudel Jun 05 '19 at 15:25
  • Updated to address both comments. – Lightness Races in Orbit Jun 05 '19 at 15:26
  • Also, I am from Python background, where a single quote and double quotes are the same, so I got another error also. – BhishanPoudel Jun 05 '19 at 15:28
  • 3
    @astro123 Another reason to work from a good book instead. There are different kinds of literals in C++, which is completely different from Python. – Lightness Races in Orbit Jun 05 '19 at 15:28
  • 6
    This site (geeksforgeeks) should be forever banned. – SergeyA Jun 05 '19 at 15:32
  • 6
    @astro123: *learning online from geeksforgeeks.org/memset-in-cpp* There's your first problem. That tutorial has a serious bug in its tiny example. This is not rare on geeksforgeeks.org. There is *some* good stuff, but it's often mixed in with bad stuff, and *until you're already an expert* you won't know how to tell the difference. Unlike Stack Overflow, geeksforgeeks doesn't have a voting mechanism for people to review posts and indicate their quality, so you have no way of knowing which ones to trust. – Peter Cordes Jun 06 '19 at 00:23
  • 1
    @PeterCordes its a shame SO Documentation went the way it did... there's clearly a demand for voted-on, curated tutorials. I'm sure somebody will figure out the right design, eventually. – mbrig Jun 06 '19 at 18:18
  • Why "don't"? Isn't the implementation of memset often significantly faster? `fill` may not always be implemented to use an assembly directive (not even at -O4), while memset should always be using it if it's available. Also strlen should never be used. Just never. if you know the size of the string at compile time (and sizeof does) use it. If you don't know it at compile time, `strlen` is unsafe. – grovkin Jun 06 '19 at 20:01
  • @grovkin No, a mainstream implementation will delegate to `memset` via specialisation, i.e. when the template args suggest it's called for (optimisation level not required) - [e.g. libstdc++](https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algobase.h#L717). There's no need to try to "beat the toolchain", because it's better than us. On the other hand, you sacrifice your type safety and I _have_ seen long-standing hidden bugs when someone's changed a type and not scanned for all its uses, one of which was a naughty `memset` on what had previously been a C array. – Lightness Races in Orbit Jun 07 '19 at 10:21
  • @LightnessRacesinOrbit you would expect templates to be specialized for char*, but I have seen compilers used in production code (I think it was Sun's) being 10x slower when using `copy()` instead of `memcpy` on vectors. Just because there are mechanisms in the language to handle this, doesn't mean that the compiler you use handles it. And when the issue is of practical rather than theoretical importance, you have to use the tools you rather than the ones you think you should have. – grovkin Jun 07 '19 at 19:12
  • @grovkin Such a blatantly substandard implementation should not be used in the first place. – Lightness Races in Orbit Jun 08 '19 at 00:33
  • @LightnessRacesinOrbit again, when the issue is of practical rather than theoretical importance, you have to use the tools you have rather than the ones you think you should have. I see that you still have strlen in the answer, btw. That function should never be used. In any code.... ever. – grovkin Jun 08 '19 at 00:43
5

This is the correct syntax for memset...

void* memset( void* dest, int ch, std::size_t count );

Converts the value ch to unsigned char and copies it into each of the first count characters of the object pointed to by dest. If the object is a potentially-overlapping subobject or is not TriviallyCopyable (e.g., scalar, C-compatible struct, or an array of trivially copyable type), the behaviour is undefined. If count is greater than the size of the object pointed to by dest, the behaviour is undefined.

(source)

For the first syntax memset(str, 't', sizeof(str));. The compiler complained because of extra size. It prints 18 times tttttttttttttt!R@. I suggest try with sizeof(str) -1 for char array.

For Second syntax memset(str, "t", sizeof(str)); you are providing the second parameter is a string. This is the reason compiler complains error: invalid conversion from ‘const char*’ to ‘int’

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Arun Kumar
  • 151
  • 3
  • 10
  • *potentially-overlapping subobject* of what? It's not automatically UB to modify the object-representation of other objects in C++. For example, `uint32_t` has a fully defined object representation (except for the endian byte-order). So it's not clear what kind of overlap you're talking about, because `memset` only takes one pointer arg; the other args are by value. That phrasing makes sense for `memcpy` which forbids overlap, unlike `memmove`. – Peter Cordes Jun 06 '19 at 00:26
  • @PeterCordes To be fair, that phrase was plagiarised from cppreference.com. So if it's wrong, cppreference.com needs to be corrected. – Lightness Races in Orbit Jun 06 '19 at 09:48
  • @LightnessRacesinOrbit: on cppref, that phrase is a hyperlink to [a definition](https://en.cppreference.com/w/cpp/language/object#Subobjects) that makes sense. It's somewhat plausible for it to be UB if a memset might be modifying the bytes of another object as well (because the pointer is to a subojected of a struct that's declared with `[[no_unique_address]]` allowing a compiler to do whatever it wants, including create bitfields for narrow or bool types I guess). I'm less clear on the "base class subobject" part; possibly that's UB because it could overwrite a vtable pointer? – Peter Cordes Jun 06 '19 at 09:57
  • 1
    @PeterCordes - it's talking about [something like this](https://godbolt.org/z/tiYnfW). Here, `base` is trivially copyable, but it not safe for `memset` (or `memmove`) or because it is a potentially overlapping subobject. Note that `sizeof(base) == 8`, yet when it is used as a base of `derived` (which itself has a `char` member), `sizeof(derived) == 8`! So the members of derived are stored in the padding of `base`. Hence it is unsafe to overwrite an arbitrary `base&` with `memset` since you'd also clobber the derived member in this case. – BeeOnRope Jan 21 '20 at 18:44
  • Note also how this is reflected in the code generation on gcc for zeroing `base` in `b = base{}`: it does a `qword` and `byte` write, because it can't safely extend that to a single `qword` write because the padding may be reused. Then see `base2` and `derived2`: these are identical except that `base2` is `struct` not `class`. Then it becomes an aggregate and I guess overlapping is banned (note how the `b = base2{}` codegen changes). – BeeOnRope Jan 21 '20 at 18:53
  • @BeeOnRope: You mean `dword` + `byte` to zero `base` (you said qword twice). Interesting. The only difference between class and struct is that class defaults to `private:` while struct defaults to `public:`. It appears that putting derived members into the padding of the base depends only on visibility, and switches if you use those tags to have private members in the base2 struct and public members of the base class. https://godbolt.org/z/3VLeiS – Peter Cordes Jan 21 '20 at 21:22
  • @Peter Yes, it's to do with visibility although I wasn't sure why. As above I thought it was keying off whether base was an Aggregate or not (basically the most POD-like thing C++ offers). It is not keying off of standard layout, that I checked. – BeeOnRope Jan 22 '20 at 00:50
  • @BeeOnRope: I think it might be a design decision that could have gone either way. Or maybe not: according to https://itanium-cxx-abi.github.io/cxx-abi/abi.html#POD *This ABI uses the definition of POD only to decide **whether to allocate objects in the tail-padding of a base-class subobject**. While the standards have broadened the definition of POD over time, they have also forbidden the programmer from directly reading or writing the underlying bytes of a base-class subobject with, say, memcpy.* (x86-64 uses the same C++ ABI). With some stuff about how POD in C++ has evolved. – Peter Cordes Jan 22 '20 at 01:04
  • @PeterCordes - right, well the designed decision must be in the context of the platform ABI, not just at the compiler level, since everyone has to agree on this, right? Anyways, the only property that I found that wasn't contradicted by practice, regarding whether padding could be used by a derived class was "aggregate". See [here](https://godbolt.org/z/jBeRwD). `base` is POD, trivial, and standard layout, but it still not safe. It is not aggregate, however. Of course, this is not a proof :). – BeeOnRope Jan 22 '20 at 02:36
  • @BeeOnRope: Ah, I wasn't aware that "aggregate" had a specific technical meaning which included having no private / protected members. [What are Aggregates and PODs and how/why are they special?](//stackoverflow.com/q/4178175). I haven't checked this, but I think from the C++ ABI's notes on "POD" that (some draft of) ISO C++ must say that you can step on the padding of an aggregate, but not necessarily in general for any POD / trivially-copyable type. So you can put derived members in that padding when the base is not an aggregate. That's what this C++ ABI chooses to agree on. – Peter Cordes Jan 22 '20 at 02:49
  • @PeterCordes - yes, I just finished reading (skimming) that FAQ also :). I don't find the word aggregate in the Itanium ABI you linked. The ABI was written long along, before a lot of the changes in the C++ standard mentioned in the FAQ, and before some of the terms even existed. In particular, finer distinctions were introduced in later standard that the ABI doc wouldn't know about. 1/x – BeeOnRope Jan 22 '20 at 02:54
  • I didn't read the ABI, but based on searches I can't find language covering the case. It mentions "base class subobject" in the section you linked as one type "potentially-overlapping subobject" (the other one being data members with `no_unique_address`), but all further referces to "base class subobject" don't seem relevant (they are about vtables), and further refs to "p-o subobject" all seem to concern the data member case, not the base-class case. I'm making a specific question on the topic, will link it here. – BeeOnRope Jan 22 '20 at 02:55
  • @PeterCordes - [FYI](https://stackoverflow.com/questions/59852054/under-what-conditions-is-it-safe-to-use-stdmemcpy-to-copy-between-objects). – BeeOnRope Jan 22 '20 at 03:07
5

Vlad has helpfully answered the first part of your question, but I feel like the second part could be explained a little more intuitively:

As others have mentioned, 't' is a character whereas "t" is a string, and strings have a null terminator at the end. This makes "t" an array of not one but two characters - ['t', '\0']! This makes memset's error more intuitive - it can coerce a single char to an int easily enough, but it chokes when it's given an array of chars. Just like in Python, int(['t', '\0']) (or ord(['t', '\0'])) doesn't compute.

Valhalla
  • 89
  • 7
  • 2
    And to be even more precise, when passing "t", one passes the address of 't' in "t". So if it were converted to the `int` parameter in `memset`, it would be the pointer to 't' getting converted to `int`, rather than the value of the string getting converted to `int`. – grovkin Jun 06 '19 at 19:58