1

X3 newbie here. Two questions:

  1. Why the result contains repeated "1,1,1"s, like so: <attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]]</attributes>, when I expect something like this <attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, ., 1], [8, 0]]</attributes>
  2. What would be not-so-awkward way to define a single char to be expanded to (treated like ?) a sequence in dec_octet rule. I've used x3::repeat(1)[x3::digit], but this seems wrong and probably causes errors of first question. (x3::repeat(1)[x3::digit] is used because it seems I can'not just use x3::digit instead, because it would fail rule collapsing ?)
#include <iostream>
#include <string>

#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>


namespace x3 = boost::spirit::x3;

namespace ast
{
    struct ip_port
    {
        std::string host;
        boost::optional<std::string> port;
    };

}
BOOST_FUSION_ADAPT_STRUCT(ast::ip_port, host, port)

namespace parser
{
    template <typename T> auto as = [](auto name, auto p) { return x3::rule<struct _, T> {name} = p; };
    
    const auto dec_octet = as<std::string>("dec_octet",
            (
              x3::char_('2') >> x3::char_('5') >> x3::char_('0', '5')
            | x3::char_('2') >> x3::char_('0', '4') >> x3::digit
            | x3::char_('1') >> x3::digit >> x3::digit          
            | x3::char_('1', '9') >> x3::digit
            | x3::repeat(1)[x3::digit] // awkward way to force sequence from single char, but can't use x3::digit
            )

    );

    const auto ipv4address = as<std::string>("ipv4address",
      dec_octet >> x3::char_('.') >> dec_octet >> x3::char_('.') >> dec_octet >> x3::char_('.') >> dec_octet
    );

    const auto ip = as<std::string>("host", ipv4address);
    const auto port = as<std::string>("port", +x3::digit);
    const auto ip_port = as<ast::ip_port>("ip_port",  ip >> -((':') >> port));
}


template <typename T, typename Parser>
bool parse(const std::string& in, const Parser& p)
{
    T parsed;
    auto iter = in.begin();
    auto end_iter = in.end();
    bool res = x3::parse(iter, end_iter, p, parsed);
    
    return res && (iter == end_iter);
}


int main()
{
    std::cerr << std::boolalpha << parse<ast::ip_port>(std::string{"192.168.1.1:80"}, parser::ip_port) << '\n';
    return EXIT_SUCCESS;    
}

Debug output:

<ip_port>
  <try>192.168.1.1:80</try>
  <host>
    <try>192.168.1.1:80</try>
    <ipv4address>
      <try>192.168.1.1:80</try>
      <dec_octet>
        <try>192.168.1.1:80</try>
        <success>.168.1.1:80</success>
        <attributes>[1, 9, 2]</attributes>
      </dec_octet>
      <dec_octet>
        <try>168.1.1:80</try>
        <success>.1.1:80</success>
        <attributes>[1, 6, 8]</attributes>
      </dec_octet>
      <dec_octet>
        <try>1.1:80</try>
        <success>.1:80</success>
        <attributes>[1, 1, 1]</attributes>
      </dec_octet>
      <dec_octet>
        <try>1:80</try>
        <success>:80</success>
        <attributes>[1, 1, 1]</attributes>
      </dec_octet>
      <success>:80</success>
      <attributes>[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1]</attributes>
    </ipv4address>
    <success>:80</success>
    <attributes>[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1]</attributes>
  </host>
  <port>
    <try>80</try>
    <success></success>
    <attributes>[8, 0]</attributes>
  </port>
  <success></success>
  <attributes>[[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]]</attributes>
</ip_port>
true

Thanks.

psb
  • 342
  • 3
  • 12

1 Answers1

2

Q. 1. Why the result contains repeated "1,1,1"s, like so: [[1, 9, 2, ., 1, 6, 8, ., 1, 1, 1, ., 1, 1, 1], [8, 0]], when I expect something like this [[1, 9, 2, ., 1, 6, 8, ., 1, ., 1], [8, 0]]

It's been 7 days since the last time people ran into this pitfall:

It's the age-old "container attributes aren't atomic" pitfall:

You can paper over it using qi::hold. Or you can revise your strategy.

Like in that case, I'd advise to use raw to get the underlying source sequence instead.

Q. 2. [...] not so awkward [...]

The intermediate step would be

const auto dec_octet = x3::raw [ x3::uint_parser<uint8_t>{} ];

Boom. Use the fact that X3 is a highlevel parser generator. Don't do the nitty gritty, error prone work. In fact, you could simply

const x3::uint_parser<std::uint8_t> dec_octet{};

Which defers the "stringification" to the point where it's needed:

const x3::uint_parser<std::uint8_t> dec_octet{};
const x3::uint_parser<std::uint16_t> port{};

const auto ipv4address = x3::raw [
      dec_octet >> '.' >> dec_octet >> '.' >> dec_octet >> '.' >> dec_octet ];

const auto ip_port = as<ast::ip_port>("ip_port", ipv4address >> -(':' >> port));

After Liposuction

Note the use of uint16_t for the port, x3::eoi to expect full parse, removal of explicit rule/conversions:

Live On Coliru

#include <iostream>
#include <string>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/optional/optional_io.hpp>
#include <boost/spirit/home/x3.hpp>

namespace x3 = boost::spirit::x3;

namespace ast {
    struct ip_port {
        std::string host;
        boost::optional<uint16_t> port;
    };
    using boost::fusion::operator<<;
}

BOOST_FUSION_ADAPT_STRUCT(ast::ip_port, host, port)

namespace parser {
    const x3::uint_parser<uint8_t> dec_octet {};
    const x3::uint_parser<uint16_t> port {};

    const auto ipv4address = x3::raw[dec_octet >> '.' >> dec_octet >> '.'
        >> dec_octet >> '.' >> dec_octet];

    const auto ip_port = ipv4address >> -(':' >> port) >> x3::eoi;
}

template <typename Parser, typename Attr>
static inline bool parse(std::string_view in, Parser const& p, Attr& result)
{
    return x3::parse(in.begin(), in.end(), p, result);
}

auto parse_ipport(std::string_view in)
{
    ast::ip_port result;
    if (!parse(in, parser::ip_port, result))
        throw std::invalid_argument("ipv4address");

    return result;
}

int main()
{
    for (auto input : { "192.168.1.1:80", "1.1.1.1", ":" }) {
        std::cerr << parse_ipport(input) << std::endl;
    }
}

Prints

(192.168.1.1  80)
(1.1.1.1 --)
terminate called after throwing an instance of 'std::invalid_argument'
  what():  ipv4address
Aborted (core dumped)

Simplifying the code some more by removing the optional:

(192.168.1.1 80)
(1.1.1.1 0)
terminate called after throwing an instance of 'std::invalid_argument'
  what():  ipv4address

Out-Of-The-Box

Note: your grammar doesn't match all RFC compliant ip v4 addresses. E.g.

  • 127.1 is valid for 127.0.0.1.
  • so is 0177.1 or 0x7f.1

Either fix it for real or don't re-invent the wheel, using boost::asio::ip::address_v4::from_string or even boost::asio::ip::address::from_string and getting IPv6 support for free.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks a lot for this superuseful answer ! I've probably somehow missed x3::uint_parser in official documentation, but I still can not see it there. Is there any other place I can look for x3 reference doc ? – psb Feb 14 '21 at 09:21
  • using const x3::uint_parser dec_octet {}; is not as strict as the OP tentative. it successfully parse "01.1.2.3" which according to rfc3986 is not a valid ip. – sandwood Sep 18 '21 at 19:00