1

I am writing a simple serialization using the format L"79349 Dexter 03 05"

(Assume that the Dexter part will be always 1 word.)

This string is to be read into 3 ints and a wchar_t array

I currently have the following code:

#include <iostream>
#include <stdio.h>
#include <string>

using namespace std;

int main()
{
    int id=-1,season=-1,episode=-1;
    wchar_t name[128];
    swscanf_s(L"79349 Dexter 03 05", L"%d %ls %d %d", &id, name, &season, &episode);

    wcout << "id is " << id << endl; 
    wcout << "name is " << wstring(name) << endl; //wprintf(L"name is %ls",name);
    wcout << "season is " << season << endl;
    wcout << "episode is " << episode << endl;
}

The code above is compiled(in VS '13) without a problem, however, when executed it crashes. Using the debug option I get the message: Unhandled exception at 0xFEFEFEFE in test3.exe: 0xC0000005: Access violation executing location 0xFEFEFEFE.

By omitting some parts, I find out that this problem is occured when reading into name.

e.g The following works just fine:

swscanf_s(L"79349 Dexter 03 05", L"%d %*ls %d %d", &id, &season, &episode);

What am i doing wrong?

My guess is that I am missing something simple and trivial but cannot find out on my own. Thanks in advance.

Sweeney Todd
  • 880
  • 1
  • 11
  • 25
  • 1
    The problem appears to be that the function doesn't know how to parse the string. Strings can include spaces and numbers as well as letters, and it's likely that the string argument is consuming the entire remaining input, which leaves no tokens left for season and episode. Consider using `wcstok_s`; since your string cannot include spaces, you could use a space as your delimiter. – Brett Wolfington May 18 '14 at 14:37
  • @BrettWolfington even if I use `swscanf_s(L".IHATECPP.",L".%ls.",name);`, the problem is still present – Sweeney Todd May 18 '14 at 14:41
  • 1
    The code in your comment runs without error on my machine. Note that you don't need the "l" prefix on the format specification field when using swscanf; the wide-character format is used by default when you specify `%s`. However, the issue with the original code remains. You are using scanf when you need to be using strtok. The string argument will consume everything to the end of the input string, leaving no tokens for season or episode. – Brett Wolfington May 18 '14 at 14:56
  • @BrettWolfington it does run but the program still crashes when you try to print the value of name. I am convinced to use strtok but I do not think the problem is because `the string argument consumes all the input`, since the second code in my question runs and assigns the values of season & episode correctly. – Sweeney Todd May 18 '14 at 15:41
  • It's because there is a clear separation between the numbers. Spaces cannot be numbers, but spaces *and* numbers can both be strings. – Brett Wolfington May 18 '14 at 15:46
  • check that statement again, string is still parsed with `%*ls` but it is just not assigned to `name`. Am i right? – Sweeney Todd May 18 '14 at 15:54

1 Answers1

2

My reputation is currently too little to comment. As Brett says, you need to use wcstok_s. What you're trying to do is "tokenise" the long string into smaller token strings. This is what wcstok_s will do for you. On the other hand, swscanf_s will attempt to convert the whole string that you pass into the first format argument.

The other reason this isn't working for you is because you haven't specified how many bytes to scan. The "_s" versions are more "secure" in that they protect from buffer overruns which can corrupt memory and cause all sorts of problems. If you replace your:

swscanf_s(L".IHATECPP.",L".%ls.",name);

with

swscanf_s(L".IHATECPP.", L".%ls.", name, _countof(name));

the result will be: IHATECPP.. The first "." (dot) isn't parsed.

This question: Split a string in C++? might help you if you can use more C++-style routines instead of the older C-style ones. If you can't for whatever reason, then this: C++ Split Wide Char String might give you some ideas instead, as it's using wcstok(). Once wcstok_s has split the original strings into smaller substrings (tokens), then you'll need to convert to integer the ones you know are going to be so.

In general, you can search for "C++ tokenize" and you should find a lot of examples.

Community
  • 1
  • 1
djikay
  • 10,450
  • 8
  • 41
  • 52
  • thank you for your answer, I will try the tokenising way as you suggest, but I thought that the purpose of scanf functions was to be used for such cases? – Sweeney Todd May 18 '14 at 15:52
  • 1
    The scanf functions take a string and convert it to one thing, depending on the format specification, e.g. %d for integer, %s for string, etc. So basically, one string goes in and another "something" comes out, one to one. The strtok functions will tokenize a string. – djikay May 18 '14 at 15:55