0

I am using sqlite3_exec() function for executing the queries. I am aware that as per below post, it's better to use _prepare, _bind, _step, _finalize combination for various benefits.

sqllite query with encrypted std::string (unrecognized token)

However, in my case I already have a library in place, which is tightly coupled around sqlite3_exec() and it will be difficult to change. Even for education purpose, I would like to know, how can I enter special tokens inside an sqlite3 database. Currently I am programatically reading this statement from a file using C++ and inserting into the DB using sqlite3_exec()

Currently, following statement is giving error:

INSERT INTO Convey(username, data, id, type)  
VALUES('12345', '^Z^Dfine* ^HºÉÙ<98>ö÷×^B2^@', 'abcd', 31);

The error comes at this symbol ^@ as SQL error: unrecognized token:. When I dismantled the statement and checked ascii for each characters, it showed 0 in place of ^@. The type of data, be it TEXT or BLOB doesn't make a difference.

Note that, the "data", I am trying to enter here is a Google protobuf message in a serialised form.

What is the correct way to escape such "nul" like characters in sqlite, when one has to use sqlite3_exec()?

[Note: 1 way is to replace 0 with some pre-defined pattern, which less likely to be found in binary strings and then restore them while reading it. I am open to this solution as well.]

JJJ
  • 32,902
  • 20
  • 89
  • 102
iammilind
  • 68,093
  • 33
  • 169
  • 336

2 Answers2

1

Seems that it's not possible to INSERT a data with blob nature, i.e. containing NUL ('\0') characters (or possibly new line '\n' characters).

Hence, now I am inserting in following way:

  • Before inserting, check if there is any such special character (i.e. '\0', '\n')
  • If yes, then within the string, append a special known token at the end such as '%'
  • List the positions separated by comma after that token; those positions can be replaced by temporary place holder such as '0'

i.e. '^Z^Dfine* ^HºÉÙ<98>ö÷×^B2^@' will look like '^Z^Dfine* ^HºÉÙ<98>ö÷×^B20#<position>'

While reading the string, the same has to be decoded. This is very helpful and working nicely so far!


For those who are interested, here is the code for replacement of '\0':

#define SQLa_STRING_END      '%'
#define SQLa_NUL_SEPARATOR   ','
#define SQLa_NUL_TOKEN       '0'

string
ToString (const string& s_,  // need to enclose with '' and add ' before '
          const bool isEncodingString = false)
{
  string s = s_;
  const char apostrophe = '\'';
  bool isStringEndPending = isEncodingString;
  for(size_t i = 0, length = s.length(); i < length; ++i)
    switch(s[i])
    {
    case apostrophe:
      s.insert(i++, 1, apostrophe);
      ++length;
    break;
    case 0:
      if(isEncodingString)
      {
        if(isStringEndPending)
        {
          s += SQLa_STRING_END;
          isStringEndPending = false;
        }
        s[i] = SQLa_NUL_TOKEN; ;
        s += std::to_string(i) + SQLa_NUL_SEPARATOR;
      }
    default:
    break;
    }
  s.insert(0, 1, '\'');
  return s += '\'';
}

string
FromString (const char value[])  // enclosing '' are not present
{
  string s = value;
  if(const auto posEnd = s.rfind(SQLa_STRING_END) + 1)
  {
    std::replace(s.begin() + posEnd, s.end(), SQLa_NUL_SEPARATOR, '\0');
    for(auto pos = posEnd, length = s.length() - 1;
        pos < length;
        s[::atoll(&s[pos])] = 0, pos = s.find(SQLa_NUL_SEPARATOR, pos) + 1);
    s.erase(posEnd - 1);
  }
  return s;
}
iammilind
  • 68,093
  • 33
  • 169
  • 336
0

The character with value zero is not a valid UTF-8 character, so it cannot be written directly in SQL.

Just for the record: you should use parameters. But if you want to do this in SQL, you have to write a blob literal:

... VALUES('12345', x'1A0466696E652A2008BAC9D998F6F7D7023200', 'abcd', 31);
CL.
  • 173,858
  • 17
  • 217
  • 259
  • Have you tested the blob literal for the above string? i.e. `x'^Z^Dfine* ^HºÉÙ<98>ö÷×^B2^@'`. For me, it gives the same "unrecognized token" error. Also how did you manage to get `1A0466696E652A2008BAC9D998F6F7D7023200`; is it related to the above string in Qn? – iammilind Dec 09 '17 at 04:01
  • "BLOB literals are string literals containing hexadecimal data and preceded by a single "x" or "X" character. – CL. Dec 09 '17 at 08:12
  • But that's not working with the above string literal. Do you mean that the string literal mentioned in the original question has to be converted to hex or base64 data and then store it with `x' ... '`? Is there any way without conversion? – iammilind Dec 09 '17 at 08:38
  • A blob literal contains the hexadecimal *representation* of the blob. E.g., the literal `x'123456'` contains three bytes. It is not possible to avoid conversion because the zero-valued character is not valid in a SQL command. However, the data in the database will be correct. – CL. Dec 09 '17 at 09:03
  • So that means, we have to convert the utf-8 based string to hexadecimal format. Isn't it? Do we have any standard c++ library function for the same? Additional Qn: Once converted to Hex, will it matter if we represent with Blob `x'...'` or simple string? – iammilind Dec 09 '17 at 09:06
  • The original string is *not* encoded in UTF-8, and it is not a valid string to begin with. A simple string represents a string, not a blob. – CL. Dec 09 '17 at 09:08
  • The string mentioned in your code `'1A0466696E652A2008BAC9D998F6F7D7023200'` is just an example? Or did you use some C++ function to convert `'^Z^Dfine* ^HºÉÙ<98>ö÷×^B2^@'` into `'1A0466696E652A2008BAC9D998F6F7D7023200'`? – iammilind Dec 09 '17 at 09:16
  • I don't know the encoding, so this example might be or might not be correct. (For how to get the hexadecimal representation of a bunch of bytes, see https://stackoverflow.com/q/10599068/11654.) – CL. Dec 09 '17 at 09:26