-4

Using the parser generator nom, how can I write a parser which extracts the difference of the minus sign in the terms 1-2 and 1*-2 ?

In the first example, I expect the tokens 1, - and 2. In the second the "minus" sign specifies the number being negative. The expected tokens are 1, * and -2. Not 1, *, - and 2.

How can I make nom stateful, with user-defined states such as expect_literal: bool?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Matthias
  • 1,055
  • 2
  • 14
  • 26
  • 3
    This isn't a nom-specific problem, it's just purely a conceptual parsing / grammar question. Moreover, it's a very common parsing problem; it doesn't appear that you've done the appropriate amount of research before asking this question. – Shepmaster Aug 20 '19 at 14:10
  • I'm 99% sure that state isn't required to address this. – Shepmaster Aug 20 '19 at 14:12
  • 4
    See also [How to parse an arithmetic string expression with negative numbers and minus signs?](https://stackoverflow.com/q/52996557); [Evaluate code, differentiating between minus and negative?](https://stackoverflow.com/q/33158960); [How to differentiate '-' operator from a negative number for a tokenizer](https://stackoverflow.com/q/26529711); [How Compiler distinguishes minus and negative number during parser process](https://stackoverflow.com/q/27478834) – Shepmaster Aug 20 '19 at 14:13
  • [And many more similar questions](https://www.google.com/search?q=site%3Astackoverflow.com+parser+negative+minus) – Shepmaster Aug 20 '19 at 14:14
  • @Shepmaster Thank you for posting answers which do not cover that question. They all solve a similar problem, when someone writes a parser from scratch, but not the actual problem using nom. The question here targets specifically the usage of nom for such a standard use-case. – Matthias Aug 20 '19 at 14:16
  • 3
    Please provide your best attempt at creating a parser in the form of a [mcve], with a specific implementation question. – Gardener Aug 20 '19 at 14:36
  • 4
    `nom` isn't a parser, or a parser generator. `nom` is a library for easily crafting parsers in a parser-combinator fashion. How you tokenize is entirely up to you, and the linked questions address exactly what you're asking about. – Zarenor Aug 20 '19 at 14:38
  • @Zarenor But `nom` provides some features helping you to solve problems like this without reinventing the wheel. As you can see in the answer below. Sure you could post-process things like that. But often this makes things unnecessary complicated. And non of the linked questions addresses what I was asking about. – Matthias Aug 22 '19 at 09:51
  • @Matthias If your answer is satisfactory to you, very well, but you're using `nom_locate` (a crate which isn't `nom`, or made by the creator of `nom`) and making a stateful parser, which does things in an unorthodox manner. I wouldn't rate it a particularly idiomatic solution, or very extensible. If it's what you wanted, I'm happy, but it isn't the orthodox way to do things, and it may have different performance characteristics, or be harder to refactor later if you need to extend the code to do more. – Zarenor Aug 22 '19 at 20:16
  • @Zarenor. `nom_locate` is a kind of a helper crate for `nom`. You could write the needed lines of code by yourself... But why to reinvent the wheel? What do you mean with *unorthodox manner*? All the linked questions are either using states or a look behind technique (or using `yacc`). Do you mean the look behind approach would be more orthodox in `nom`? – Matthias Aug 23 '19 at 06:33

1 Answers1

1

The best solution I found for now is using nom_locate with a span defined as

use nom_locate::LocatedSpanEx;

#[derive(Clone, PartialEq, Debug)]
struct LexState {
    pub accept_literal: bool,
}

type Span<'a> = LocatedSpanEx<&'a str, LexState>;

Then you can modify the state via

fn set_accept_literal(
    value: bool,
    code: IResult<Span, TokenPayload>,
) -> IResult<Span, TokenPayload> {
    match code {
        Ok(mut span) => {
            span.0.extra.accept_literal = value;
            Ok(span)
        }
        _ => code,
    }
}

where TokenPayload is an enum representing my token content.

Now you can write the operator parser:

fn mathematical_operators(code: Span) -> IResult<Span, TokenPayload> {
    set_accept_literal(
        true,
        alt((
            map(tag("*"), |_| TokenPayload::Multiply),
            map(tag("/"), |_| TokenPayload::Divide),
            map(tag("+"), |_| TokenPayload::Add),
            map(tag("-"), |_| TokenPayload::Subtract),
            map(tag("%"), |_| TokenPayload::Remainder),
        ))(code),
    )
}

And the integer parser as:

fn parse_integer(code: Span) -> IResult<Span, TokenPayload> {
    let chars = "1234567890";
    // Sign ?
    let (code, sign) = opt(tag("-"))(code)?;
    let sign = sign.is_some();
    if sign && !code.extra.accept_literal {
        return Err(nom::Err::Error((code, ErrorKind::IsNot)));
    }
    let (code, slice) = take_while(move |c| chars.contains(c))(code)?;
    match slice.fragment.parse::<i32>() {
        Ok(value) => set_accept_literal(
            false,
            Ok((code, TokenPayload::Int32(if sign { -value } else { value }))),
        ),
        Err(_) => Err(nom::Err::Error((code, ErrorKind::Tag))),
    }
}

This might not win a beauty contest but it works. The remaining pieces should be trivial.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Matthias
  • 1,055
  • 2
  • 14
  • 26