5

I've got a bit of a problem. Essentially, I need to store a large list of whitelisted entries inside my program, and I'd like to include such a list directly -- I don't want to have to distribute other libraries and such, and I don't want to embed the strings into a Win32 resource, for a bunch of reasons I don't want to go into right now.

I simply included my big whitelist in my .cpp file, and was presented with this error:

1>ServicesWhitelist.cpp(2807): fatal error C1091: compiler limit: string exceeds 65535 bytes in length

The string itself is about twice this allowed limit by VC++. What's the best way to include such a large literal in a program?

EDIT:

I'm storing the string like this:

const std::wstring servicesWhitelist
(
 L".NETFRAMEWORK|"
 L"_IOMEGA_ACTIVE_DISK_SERVICE_|"
 L"{6080A529-897E-4629-A488-ABA0C29B635E}|"
 L"{834170A7-AF3B-4D34-A757-E05EB29EE96D}|"
 L"{85CCB53B-23D8-4E73-B1B7-9DDB71827D9B}|"
 L"{95808DC4-FA4A-4C74-92FE-5B863F82066B}|"
 L"{A7447300-8075-4B0D-83F1-3D75C8EBC623}|"
 L"{D31A0762-0CEB-444E-ACFF-B049A1F6FE91}|"
 L"{E2B953A6-195A-44F9-9BA3-3D5F4E32BB55}|"
 L"{EDA5F5D3-9E0F-4F4D-8A13-1D1CF469C9CC}|"
 L"2WIREPCP|"
//About 3800 more lines
);

EDIT2 It's used at runtime in a way similar to this:

static const boost::wregex servicesWhitelistRegex(servicesWhitelist);
std::wstring service;
//code to populate service
if (!boost::regex_match(service, servicesWhitelistRegex))
 //Do something to print service
Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • How are you storing the string? Like, is it parsed and stored in a set? – GManNickG Mar 20 '10 at 05:07
  • @GMan: See question edit – Billy ONeal Mar 20 '10 at 05:12
  • Is there some reason that it must be stored in exactly this format? It looks to me like it might better be stored in a `list<>` or something. – greyfade Mar 20 '10 at 05:14
  • How do you look up values in that string, I mean. At run-time, do you parse it? – GManNickG Mar 20 '10 at 05:14
  • @GMan: Edited again. Also made a C-W. – Billy ONeal Mar 20 '10 at 05:19
  • @greyfade: The reason I do not (currently) have that is then I have the overhead of three gazillion calls to mycontainer::push_back which makes the binary huge. – Billy ONeal Mar 20 '10 at 05:23
  • 2
    Oh, good grief! You're using that whole massive string as a single regex? That must be *nightmarishly* slow. I'd look for a simpler algorithm, honestly. Build a trie from your whitelist and match `service` against it, for example. – greyfade Mar 20 '10 at 06:17
  • 1
    @greyfade: It is similar in speed to the hash table implementation I ended up using. The construction time was longer, but did not really matter in this application. In many ways the regex was faster than the hash table for longer services that were not in the whitelist because the finite state machine would fail faster. – Billy ONeal Apr 11 '10 at 21:00
  • @Carson Myers: Made it CW because I was dumb and checked the box :( – Billy ONeal Apr 11 '10 at 23:29

5 Answers5

8

How about an array? (you would put the commas only after the legal limit for every element)

const std::wstring servicesWhitelist[] = {
 L".NETFRAMEWORK|",
 L"_IOMEGA_ACTIVE_DISK_SERVICE_|",
 L"{6080A529-897E-4629-A488-ABA0C29B635E}|",
 L"{834170A7-AF3B-4D34-A757-E05EB29EE96D}|",
 L"{85CCB53B-23D8-4E73-B1B7-9DDB71827D9B}|",
 L"{95808DC4-FA4A-4C74-92FE-5B863F82066B}|",
 L"{A7447300-8075-4B0D-83F1-3D75C8EBC623}|",
 L"{D31A0762-0CEB-444E-ACFF-B049A1F6FE91}|",
 L"{E2B953A6-195A-44F9-9BA3-3D5F4E32BB55}|",
 L"{EDA5F5D3-9E0F-4F4D-8A13-1D1CF469C9CC}|",
 L"2WIREPCP|",
...
};

You could use the below statement to get the combined string.

accumulate(servicesWhitelist, servicesWhitelist+sizeof(servicesWhitelist)/sizeof(servicesWhitelist[0]), "")
Sameer
  • 725
  • 5
  • 8
6

Let's assume you actually need to store a string >64k characters (i.e. all of the above "just don't do that" solutions don't apply.)

To make MSVC happy, instead of saying:

const char *foo = "abcd...";

You can convert your >64k character string to individual characters represented as integers:

const char foo[] = { 97, 98, 99, 100, ..., 0 };

Where each letter has been converted to its ascii equivalent (97 == 'a', etc.), and a NUL terminator has been added at the end.

MSVC2010 at least is happy with this.

Matt Pharr
  • 1
  • 1
  • 1
1

If it's only about twice the limit the obvious solution would seem to be to store 2 (or 3) such strings. :) I'm sure your code that reads them at runtime can deal with that easily enough.

EDIT: Do you need to use a regex for some reason? Could you break up the big strings into a list of individual tokens and do a simple string comparison?

EMP
  • 59,148
  • 53
  • 164
  • 220
0

I claim no credit for this one:

https://social.msdn.microsoft.com/Forums/vstudio/en-US/c573db8b-c9cd-43d7-9f89-202ba9417296/fatal-error-c1091

Use the STL instead.

Code Snippet

#include <sstream>

std::ostringstream oss;

oss << myString1 << myString2 << myString3 << myString4;

oss.str() would now return an instance of the STL's std:: string class, and oss.str().c_str() would return a const char*

Community
  • 1
  • 1
rstackhouse
  • 2,238
  • 24
  • 28
-2

You problem could be stripped down to (in Python):

whitelist_services = { ".NETFRAMEWORK", "_IOMEGA_ACTIVE_DISK_SERVICE_" }
if service in whitelist_services:
   print service, "is a whitelisted service"

A direct translation to C++ would be:

// g++ *.cc -std=c++0x && ./a.out
#include <iostream>
#include <unordered_set>

namespace {
  typedef const wchar_t* str_t;
  // or
  ////typedef std::wstring str_t;
  str_t servicesWhitelist[] = {
    L".NETFRAMEWORK",
    L"_IOMEGA_ACTIVE_DISK_SERVICE_",
  };
  const size_t N = sizeof(servicesWhitelist) / sizeof(*servicesWhitelist);

  // if you need to search for multiple services then a hash table
  // could speed searches up O(1). Otherwise std::find() on the array
  // might be sufficient O(N), or std::binary_search() on sorted array
  // O(log N) 
  const std::unordered_set<str_t> services
    (servicesWhitelist, servicesWhitelist + N);
}

int main() {
  str_t service = L".NETFRAMEWORK";
  if (services.find(service) != services.end())
    std::wcout << service << " is a whitelisted service" << std::endl;
}
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • 1. That's nice for Python, but Python is not my target language. Sorry. 2. This seems to be a copy of Sameer's answer.... – Billy ONeal Apr 11 '10 at 22:41
  • @Billy ONeal: 1. I've used Python as a pseudo-code (as a succinct illustration that shows you don't need regexs to solve your problem) 2. The essence of the answer is to drop regex and use one of the shown approaches. Sameer's answer is in the regex's root. – jfs Apr 11 '10 at 23:28