I'm trying to generate a source map for some source file I'm parsing and I want to get the range for each node. getSourcePos only gives the start position of a node (src:line:column). How to get its end position?
-
1You would need to look at the start position of the next node. – Bob Dalgleish Dec 19 '19 at 20:13
-
1Yes, the key is to identify where is the next node. If there's a right sibling, that's easy. What if the node itself has no right sibling? – sinoTrinity Dec 19 '19 at 20:39
-
1You will have to traverse the tree, going up. – Bob Dalgleish Dec 19 '19 at 20:51
-
1Yeah, that's why I want to see if there is library doing this before implementation. Seems like a common functionality. – sinoTrinity Dec 19 '19 at 20:59
1 Answers
If you want to construct a source span like this for each lexeme:
data Span = Span SourcePos SourcePos
data Spanned a = Spanned Span a
You can just call getSourcePos
twice, once at the beginning of a token and once at the end, before consuming any whitespace, assuming you’re at the lexing stage. I’ve used a structure like this in the past to make this more convenient:
-- Augment a parser with a source span.
spanned :: Parser (a, SourcePos) -> Parser (Spanned a)
spanned parser = do
start <- getSourcePos
(x, end) <- parser
pure (Spanned (Span start end) x)
-- Consume whitespace following a lexeme, but record
-- its endpoint as being before the whitespace.
lexeme :: Parser a -> Parser (a, SourcePos)
lexeme parser = (,) <$> parser <*> (getSourcePos <* whitespace)
Bearing in mind that getSourcePos
is somewhat costly, per the documentation, and if I recall correctly this depends on source file size.
If an AST is annotated with spans, you can compute the span of any part of the tree by folding over it with a monoid instance for Span
that takes their union (or more specifically their bounding box), i.e. a <> b
is a span from (beginRow a, beginCol a) `min` (beginRow b, beginCol b)
to (endRow a, endCol a) `max` (endRow b, endCol b)
.

- 53,300
- 8
- 96
- 166
-
This solution can work, but it requires me change lots of code using lexeme. I ended up saving the SourcePos before whitespace https://www.reddit.com/r/haskell/comments/ecy3oa/how_to_get_source_range_of_ast_nodes_using/fc0zsb1?utm_source=share&utm_medium=web2x – sinoTrinity Dec 26 '19 at 02:50