1

I'm concentrating on checking for error conditions in an parser design using Spirit X3. One of which is the character category checks like isalpha or ispunct. According to the X3 documentation Character Parsers they should match what C++ provides as std::isalpha and std::ispunct. However with a code demonstration shown below I do get different results.

#include <cstddef>
#include <cstdio>
#include <cstdint>
#include <cctype>
#include <iostream>
#include <boost/spirit/home/x3/version.hpp>
#include <boost/spirit/home/x3.hpp>

namespace client::parser
{
  namespace x3 = boost::spirit::x3;
  namespace ascii = boost::spirit::x3::ascii;

  using ascii::char_;
  using ascii::space;
  using x3::skip;

  x3::rule<class main_rule_id, char> const main_rule_ = "main_rule";
  const auto main_rule__def = ascii::cntrl;

  BOOST_SPIRIT_DEFINE( main_rule_ ) 
  const auto entry_point = skip(space) [ main_rule_ ];
}

int main()
{
  printf( "Spirit X3 version: %4.4x\n", SPIRIT_X3_VERSION );

  char output;

  bool r = false;
  bool r2 = false; // answer according to default "C" locale
  char input[2];
  input[1] = 0;

  printf( "ascii::cntrl\n" );

  uint8_t i = 0;
  next_char:  
    input[0] = (char)i;
    r = parse( (char*)input, input+1, client::parser::entry_point, output );
    r2 = (bool)std::iscntrl( (unsigned char)i );
    printf( "%2.2x:%d%d", i, r, r2 );
    if ( i == 0x7f ) { goto exit_loop; }
    ++i;
    if ( i % 8 ) { putchar( ' ' ); } else { putchar( '\n' ); }
    goto next_char;
  exit_loop:

  return 0;
}

The output is:

Spirit X3 version: 3004
ascii::cntrl
00:11 01:11 02:11 03:11 04:11 05:11 06:11 07:11
08:11 09:01 0a:01 0b:01 0c:01 0d:01 0e:11 0f:11
10:11 11:11 12:11 13:11 14:11 15:11 16:11 17:11
18:11 19:11 1a:11 1b:11 1c:11 1d:11 1e:11 1f:11
20:00 21:00 22:00 23:00 24:00 25:00 26:00 27:00
28:00 29:00 2a:00 2b:00 2c:00 2d:00 2e:00 2f:00
30:00 31:00 32:00 33:00 34:00 35:00 36:00 37:00
38:00 39:00 3a:00 3b:00 3c:00 3d:00 3e:00 3f:00
40:00 41:00 42:00 43:00 44:00 45:00 46:00 47:00
48:00 49:00 4a:00 4b:00 4c:00 4d:00 4e:00 4f:00
50:00 51:00 52:00 53:00 54:00 55:00 56:00 57:00
58:00 59:00 5a:00 5b:00 5c:00 5d:00 5e:00 5f:00
60:00 61:00 62:00 63:00 64:00 65:00 66:00 67:00
68:00 69:00 6a:00 6b:00 6c:00 6d:00 6e:00 6f:00
70:00 71:00 72:00 73:00 74:00 75:00 76:00 77:00
78:00 79:00 7a:00 7b:00 7c:00 7d:00 7e:00 7f:11

So the first bit after the colon is the answer according to X3 and the second bit is the answer according to C++. The mismatch happens on the characters that also fall into the category isspace. Recently I'm more looking into the library headers, but I still haven't found a part that explains this behavior.

Why the disparity? Do I have missed something?

Oh yeah, I love my goto statements. And my retro C style. I hope you do too! Even for an X3 parser.

Zeyneb
  • 115
  • 1
  • 8
  • 1
    Hah. Noticed your comments about style late :) Oh well. Let's just say we have different opinions. I do like something like https://www.boost.org/doc/libs/1_70_0/libs/format/doc/format.html or https://github.com/fmtlib/fmt but it's a bit out of scope for SO answering – sehe Jul 15 '19 at 21:41
  • 1
    I made an edit in my post. I removed a sentence: "Is this another bug in X3" while I was to blame for misusing the skipper facility. – Zeyneb Jul 15 '19 at 22:52
  • @sehe about the C++ tag, okay that you add it but as this is so Spirit X3 specific, why would it be relevant for people with a general interest in C++? – Zeyneb Jul 15 '19 at 23:20
  • It's just further defining an audience (tags can be language-agnostic). Also, it automatically causes syntax highlighting to become relevant to the language – sehe Jul 16 '19 at 00:22
  • If you want exactly `std::iscntrl` - you can use `x3::standard::cntrl` parser. The reason for the divergence is unknown to me, probably it is a good idea to open a bug report https://github.com/boostorg/spirit/issues/new. – Nikita Kniazev Jul 16 '19 at 15:40

1 Answers1

1

You accidentally run amok with the skipper which eats any whitespace before you can actually parse it.

I simplified the parser and now it succeeds:

As a note about style: there's no reason ever to

  • use C style casts (they're dangerous)
  • write a loop with goto (considered harmful)
  • use cryptic variable names (r, r2?)

Live On Coliru

#include <boost/spirit/home/x3/version.hpp>
#include <boost/spirit/home/x3.hpp>
#include <cctype>
#include <cstddef>
#include <cstdint>
#include <iostream>
#include <iomanip>

namespace client::parser {
    using namespace boost::spirit::x3;
    //const auto entry_point = skip(space)[ ascii::cntrl ];
    const auto entry_point = ascii::cntrl;
}

int main() {
    std::cout << std::boolalpha << std::hex << std::setfill('0');
    std::cout << "Spirit X3 version: " << SPIRIT_X3_VERSION << "\n";

    for (uint8_t i = 0; i <= 0x7f; ++i) {
        auto from_x3  = parse(&i, &i + 1, client::parser::entry_point);
        auto from_std = !!std::iscntrl(i);

        if (from_x3 != from_std) {
            std::cout << "0x" << std::setw(2) << static_cast<unsigned>(i) << "\tx3:" << from_x3 << "\tstd:" << from_std << '\n';
        }
    }

    std::cout << "Done\n";
}

Prints simply

Spirit X3 version: 3000
Done

With the "bad line" commented in instead:

Live On Coliru

Spirit X3 version: 3000
0x09    x3:false    std:true
0x0a    x3:false    std:true
0x0b    x3:false    std:true
0x0c    x3:false    std:true
0x0d    x3:false    std:true
Done
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Is there certainty that for the 2nd argument to parse, &i + 1 is always a valid address? – Zeyneb Jul 15 '19 at 23:13
  • Yes. You cannot dereference it, but the address is valid for comparison. See https://stackoverflow.com/questions/9086372/how-to-compare-pointers/9086675#9086675 – sehe Jul 16 '19 at 00:26