Good afternoon, We are building a prototype of a deduper. We are using a array of STL strings to store the records to be depuped. The array looks like this:
std::string* StringArray = new std::string[NumberDedupeRecords]
The records are very large, as large as 160,000,000 bytes. When we try to store a std::string
version of a record to deduped in the std::string* StringArray
, STL makes a deep copy of the string and mallocs a new buffer of at least 160,000,000 bytes. We quickly run out of heap memory and get a std::bad_alloc exception
. Is there a workaround to avoid the deep copy and std::bad_alloc
? Perhaps we should use a new data structure for storing the std::string
records to be deduped or maybe we should save auto_ptr
's.
We show a code snippet here:
std::string clara5(curr.getPtr());
char* const maryptr = (curr.getPtr() + n - curr.low());
maryptr[54] = '\x0';
StringArray[StringArrayCount] = clara5;
curr.mPtr = (char*)StringArray[StringArrayCount].c_str();
std::multiset<Range>::iterator miter5 = ranges_type.lower_bound(Range(n));
(*miter5).mPtr = curr.mPtr; StringArrayCount += 1;
Thank you.