6

I'm new to this subject, so there's a good chance I've gotten some keyterms wrong. I'd like to parse a typescript file into its component symbols. To give a very crude example of what I'd imagine coming out, see below:

// some ts file
export function yell(output: string) {
  alert(output + "!!");
}

would create something like this:

{
  symbols: [
    { type: "comment", text: "// some ts file" },
    "\n",
    { type: "module", text: "export" },
    " ",
    { type: "function", symbols: [
      { type: "name", text: "yell" },
      ... it goes on
    ]}
  ]
}

I'm pretty certain this symbolization/tokenization is part of the initialize phase of a language server, trying to glean from this issue (https://github.com/Microsoft/language-server-protocol/issues/33).

But I read through the docs on initialize (https://microsoft.github.io/language-server-protocol/specification#initialize) and I couldn't find (ctrl-f) anything about symbols or tokens being returned.

A while back I worked with Monaco, and I know that the point of language servers is largely to standardize the tokenization and linking/navigating of code, so I'm pretty sure this is the right tool for it. But the docs are pretty dense, and seem to be far more focused on code interactions than code parsing.

How can I parse TS to symbols using a Language Server Protocol?

EDIT: good to mention, in case this is a project unto itself: I'm not looking for the full code for this or anything. Just some sort of crude overview of what goes on, and maybe a few links/exerpts to relevant docs.

EDIT 2: I found a really similar question here (TypeScript: get syntax tree), but it makes no mention of Language Servers, and appears to have come from a time before them.

EDIT 3: It appears the proper term I was looking for is AST. Found a really cool tool online for TypeScript (https://ts-ast-viewer.com/)

Seph Reed
  • 8,797
  • 11
  • 60
  • 125

1 Answers1

8

As it turns out, language servers do not expose the AST (Abstract Syntax Tree).

I found this issue, with the quote:

I can see how and AST can help here but currently there are no plans to expose an AST via the LSP. The whole idea of the LSP is to not do this since it makes standardizing things across languages and tools very hard

https://github.com/Microsoft/language-server-protocol/issues/258

Fortunately, typescript does come with some means of doing this (https://github.com/microsoft/TypeScript/wiki/Using-the-Compiler-API#using-the-type-checker).

I'll update once I've figured out this alternate way.

Seph Reed
  • 8,797
  • 11
  • 60
  • 125
  • This is correct. You'll need to use the compiler API to get this information. – David Sherret Jul 10 '19 at 14:47
  • Was my assumption that Language Servers are responsible for giving syntactic information on hover of certain text incorrect? Because (if they do that) it seems like you could get something very close to an AST with that. – Seph Reed Jul 10 '19 at 18:50
  • the editor will communicate to tsserver which uses the language service. The language service will have a document registry (ASTs) and type checker that it uses to get the information for tsserver. So you can't get the ASTs by using the language service protocol because that information is never passed over. Instead it only passes a small amount of specific information that the editor needs (ex. "what type is it at this position?" Response: "It's the string type"). Does that make sense? – David Sherret Jul 10 '19 at 18:58
  • 1
    I suppose I don't understand. If it can pass over that specific information, what would stop you from traversing a document, asking the LS about every part and building an AST? Also, if the LS is able to tell what type is at some position, than does it have an AST somewhere in its backend? My interest in this is would be the ability to make a non-language specific docs engine. – Seph Reed Jul 10 '19 at 21:36
  • 1
    That would lead to a very complex LSP. The LSP only exposes what is necessary to give specific pre-defined feedback in the editor. Yes, the language service keeps a collection of all the ASTs (document registry)—it just doesn't expose them in its API. – David Sherret Jul 10 '19 at 21:51
  • 1
    Just returning what the client asks seems to me like the design like GraphQL. LSP is just a protocol that predefines some communication ways between clients (expected to be IDE or editor) and servers (various implementation for each language), so the server holds the ASTs. So I am wondering if we can extend the protocol and add a new communication way to just return the AST? – K. Symbol Oct 23 '20 at 02:46