2

I'm using Clang in a code editor extension to implement folding/collapsing and I've run into an issue.

I'm parsing only a single file at a time, as opposed to the entire translation unit. This is partly for speed and simplicity, but also because you can open a header file by itself and not know which cpp to parse anyway.

The issue is that as soon as clang encounters an unknown symbol it just gives up and entire chunks of the file are missing from the AST. Consider the following snippet

void Foo1()
{
    while (PeekMessageA())
    {
        switch (0)
        {
        }
    }
}

The AST for this is just

|-FunctionDecl 0x1fde8f67240 <Folding Test.cpp:1:1, line:9:1> line:1:6 Foo1 'void ()'
| `-CompoundStmt 0x1fde8f673f8 <line:2:1, line:9:1>

There's no information for the while block or the switch block. Obviously It's impossible for clang to know whether PeekMessageA is a function call or object construction, etc and I'm fine with that. I don't need it to be perfect, but I do want to be able to get to the while and switch blocks.

Is there a way to get clang to provide more information in the face in this case? I'm using LibClang currently, but open to using LibTooling as well.

Adam
  • 1,122
  • 9
  • 21
  • That code doesn't compile with clang++. So please change the code that it can be used with clang++ to verify your problem. Edit: To state that clear if I adjust your code I have a while statement. – user6556709 Apr 28 '19 at 08:11
  • 1
    C++ code can only be parsed as a whole translation unit. – user7860670 Apr 28 '19 at 08:26
  • A single file can be parsed in isolation. It can't be parsed into a correct program, but it can obviously be parsed. Try opening a random header file in VS and you'll have code folding/outlining available. – Adam Apr 28 '19 at 08:39
  • @Adam clang++ complains about the missing declaration of PeekMessageA. If I add a dummy declaration "bool PeekMessageA();" I get -FunctionDecl 0x2601780 line:3:6 Foo1 'void ()' `-CompoundStmt 0x26019a0 `-WhileStmt 0x2601980 [some calls and even the dummy switch-statement]. So please write the complete code which produces the AST you showed. – user6556709 Apr 28 '19 at 08:45
  • @Adam VS doesn't parse the file for folding, it just identifies braces and other "block markers". – molbdnilo Apr 28 '19 at 08:45
  • Getting ok-ish folding / outlining is possible even without properly parsing. And it is pretty far from building AST. – user7860670 Apr 28 '19 at 08:45
  • 1
    Possible duplicate of [Clang for fuzzy parsing C++](https://stackoverflow.com/questions/11133557/clang-for-fuzzy-parsing-c) – Mike Apr 28 '19 at 09:02
  • Responses here indicate Clang is not very good at parsing chunks of C++ code. For an alternative that is, see https://stackoverflow.com/a/17393852/120163 – Ira Baxter May 15 '20 at 14:05
  • I ended up analyzing tokens to do folding instead of the AST. Simpler, more reliable, and trivial with clang. – Adam May 16 '20 at 17:22

1 Answers1

1

That's a complex problem for which there is no perfect solution, but for your needs which seems limited you could find a solution. The problem arise because C++ has some semantic ambiguities, on the contrary to other languages which have at most syntactic ambiguities (like C#), therefore clang may not be able to produce a complete AST when a file does not compile.

This happen at least because of the ambiguity between expressions and declarations. A C++ parser may have to do some symbol resolution to discriminate between these two syntax. When that resolution fails, for example when a symbol fails to be resolved neither as a type nor as a variable, clang throws away entire parts of the AST because it is simply not able to build it.

I did have the same problem years ago and if I remember well when an error happened in an if condition the entire if statement was thrown away. Therefore in that case if you want to keep the if statement you have to hack into clang parser and create error nodes in place of the condition expression. The same problem occur in all other statements containing an expression or declaration.

Hope this helps.

Dan Grahn
  • 9,044
  • 4
  • 37
  • 74
P. PICARD
  • 386
  • 1
  • 10