3

I am currently debugging a sizable program. When I come to the following line of code:

value->binary_string = value_it->binary_string.substr(range->msb->value, range->size);

The program does not behave correctly. Here value is a pointer to a struct with a member named binary_string with type std::string. When I reach this line while debugging, I see that:

value_it->binary_string = "00000000000000000000000000000111"
range->msb->value = 0
range->size = 32

After this line of code executes, value->binary_string is empty! I have even changed the line to

value->binary_string = value_it->binary_string

and it still fails!

When I reach this line while debugging, my program is using about 100 Mb of memory, so I don't think it is a memory issue (though I am running Valgrind as we speak to verify this). I am using Ubuntu 11.10, g++-4.6 and libstdc++6.

Has anyone encountered something like this before? I have no idea why my strings aren't working!

Thanks,

Sam

EDIT1:

The type of value is NumberInst, defined below:

typedef std::string String;

struct NumberInst
{
    unsigned size;
    bool signed_;
    String binary_string;
    bool valid;
    unsigned value;

    NumberInst();
};

EDIT2:

It looks like I've narrowed down the search a little bit. While debugging, I tried a few print commands:

print value_it->binary_string
"00000000000000000000000000000111"
print value_it->binary_string[31]
'1'
print value_it->binary_string.substr(0, String::npos)
""
print value_it->binary_string.substr(0, 1)
""

It seems that substr is not working properly in this context. However, when I tested substr in my main function, it seemed to work okay.

Sam Hertz
  • 161
  • 9
  • Have you verified that `value` is allocated correctly (and hasn't been freed)? – Yaniro Mar 01 '12 at 21:35
  • @Yaniro I just checked. The new call is just before that line of code and when I step into the new call I reach the constructor for value. Also, I tested a call to push_back, which seems to work properly. – Sam Hertz Mar 01 '12 at 21:41
  • Check out this copy on write stuff: http://en.wikipedia.org/wiki/Copy-on-write... it could be that the two string are using the same buffer until you change one of them if i understand correctly... – Yaniro Mar 01 '12 at 21:47
  • @Yaniro Nice link. But even if they are sharing the same buffer, shouldn't printing the second string print the contents of the first? – Sam Hertz Mar 01 '12 at 21:50
  • What is the type of `value->binary_string`? Show us the declaration. – Robᵩ Mar 01 '12 at 21:51
  • @Rob I added the definition above – Sam Hertz Mar 01 '12 at 21:56
  • 1) Run valgrind (as you said you were). 2) Reduce your program to the smallest compileable program that still demonstrates the erorr, and then post that small program in your question. http://sscce.org – Robᵩ Mar 01 '12 at 22:00
  • @Rob Valgrind did not report any memory errors except for those that occur when the program crashes (and the program crashes because the string is never copied). My program is huge (> 12000 lines) but I will try to find a more trivial example of the error. – Sam Hertz Mar 01 '12 at 22:07
  • On the off chance - is this program multithreaded, with those variables shared across threads without proper mutex support? – Dan Nissenbaum Mar 01 '12 at 23:09

2 Answers2

2

I've found that there are usually two common reasons when "strange" things like this happen:

  1. You made a simple mistake and are just overlooking it.
  2. Some sort of memory corruption.

To check the first reason carefully read the offending code and make a conscious decision to read what the code is doing and not what you think it should be doing. It is very easy to overlook obvious errors by assuming the code should be doing something it actually isn't, especially in code you've been looking at for a while. For example, a few months ago I was debugging something and was having an issue with a variable "magically" changing its value all of a sudden. Turns out I was just printing the wrong variable (duh!) and I would of caught this sooner if I had been reading what the code actually said.

Memory corruption is a harder beast to find as it could happen from any piece of code running anytime prior to when the issue shows up. Valgrind does not guarantee to find all forms of corruption, see this question for an example. Running in debug mode, setting memory watch points (if you know where the corruption always occurs) and other memory related tools might help as will reducing the problem to its minimal form...keep eliminating code being run a little at time until the corruption doesn't happen.

Community
  • 1
  • 1
uesp
  • 6,194
  • 20
  • 15
1

The problem was caused by a very subtle bug. Somewhere in my project:

NumberInst* number = new NumberInst;
number->binary_string.reserve(size);
for (unsigned i = 0; i < size; i++)
    number->binary_string[i] = ...;

A std::out_of_range exception will not be thrown because I am assuming the standard library compares the array index with the capacity of the string (as opposed to the size of the string). Calling print in a debugger will succeed, because it probably iterates through the buffer until it reaches the '\0' character. However

String str = number->binary_string

will fail because it is likely that the standard library copies the buffer of value_it->binary_string from [0, size) and adds a '\0' character. Since the size of value_it->binary_string is 0, copying its contents will fail (as will substr and other functions that rely on the size of the calling string).

In other words, the problem was caused by calling

str.reserve(size);

instead of

str.resize(size);

Thanks for your help everyone!

Sam

Sam Hertz
  • 161
  • 9
  • `NumberInst* number = new NumberInst;` this code may not do what you think it does. I'd recommend always mentionning the parentheses after a constructor, unless you really now what you're doing. Of course, creating values is recommended whenever possible. – J.N. Mar 03 '12 at 10:39