1

I'm currently researching how the V8 engine processes JavaScript source code. However, something has caught my attention. In all these medium articles, they explain it as follows:

source code -> parser -> abstract syntax tree (AST) -> interpreter -> bytecode

They say that when bytecode is generated, the code is executed at that stage. Is there no baseline compiler? I know that there is an optimizing compiler that collects profiling data, etc., but why is there no mention of a baseline compiler?

I came across an article called 'Sparkplug' on this website: https://v8.dev/blog/sparkplug. Is Sparkplug the baseline compiler?

To summarize:

  • Does the V8 engine have a baseline compiler? If not, how is bytecode executed? Doesn't it need to be translated into machine code?
  • Why is bytecode needed?

I am curious whether there is a baseline compiler or not.

emreakdas
  • 19
  • 3
  • "*they explain it as follows*" - something is wrong with that explanation. An interpreter does not output byte code. It just executes the code. – Bergi Jun 04 '23 at 22:04
  • "*how is bytecode executed? Doesn't it need to be translated into machine code?*" - no, it doesn't. Machine code is only needed if you want a CPU (the machine) to interpret the code. But that doesn't happen when software (the interpreter) interprets the code. – Bergi Jun 04 '23 at 22:05
  • Thank you for your comment. I also came across articles on Medium that mentioned writing an interpreter for Ignition, which converts the Ignition code into bytecode. As far as I know, an interpreter is a general term. If the bytecode is not translated into machine code, how is the code understood by the computer? That's the part I'm curious about. @Bergi – emreakdas Jun 04 '23 at 22:11
  • Isn't the task of an interpreter to translate the code line by line into machine code? – emreakdas Jun 04 '23 at 22:13
  • "*Isn't the task of an interpreter to translate the code line by line into machine code? As far as I know, an interpreter is a general term.*" - No. A **compiler** translates code from one language **into code** of a different (or rarely, the same) language. An **interpreter** executes code. A CPU is an interpreter for machine code, but so is software virtualisation. See https://stackoverflow.com/questions/2377273/how-does-an-interpreter-compiler-work or https://cs.stackexchange.com/questions/84970/the-difference-between-compiler-and-interpreter. – Bergi Jun 04 '23 at 22:21
  • "*how is the [bytecode] understood by the computer?*" - well it's not understood by the CPU, which is only executing the [machine] code of the interpreter program. It is understood however by the interpreter. – Bergi Jun 04 '23 at 22:22
  • Does this answer your question? https://stackoverflow.com/questions/73521517/how-is-javascript-code-transformed-into-machine-code-or-why-is-it-not – Bergi Jun 04 '23 at 22:26
  • I had read these articles. Are the corresponding code blocks executed by the CPU within the interpreter back then? When you say 'assign 1 to variable a' in bytecode, is the corresponding machine code executed within the interpreter? – emreakdas Jun 04 '23 at 22:57
  • In essence, how does the computer understand bytecode without converting it into machine code? I'm having trouble comprehending it. I read the Stack Overflow topics you provided, and there is something written there: 'the CPU executes the interpreter, the interpreter executes the JavaScript' Shouldn't the CPU ultimately produce a machine code output while executing the interpreter? What do we mean by 'executing' in this context? @Bergi – emreakdas Jun 04 '23 at 23:04
  • There is no corresponding machine code to "assign 1 to variable a". The interpreter has some byte values representing the number value 1, and it has some memory allocated for representing the variable a, and it has some code to achieve what an "assignment" does (typically meaning to copy the byte values into the relevant memory - but possibly including other things like refcounting, garbage collection, debugging and whatnot). – Bergi Jun 04 '23 at 23:04
  • No, the CPU does not produce machine code output. The CPU just [runs its instruction cycle](https://en.wikipedia.org/wiki/Instruction_cycle) forever (until its power is cut) on the machine code stored in memory, thereby executing it. "*What do we mean by 'executing'?*" - running the code so that it produces the result the author had intended, often involving I/O to interact with things beyond the computer. – Bergi Jun 04 '23 at 23:11
  • 1
    "In all these medium articles" — In all of which medium articles? You forgot to link to the guides you are referencing. – Quentin Jun 05 '23 at 10:19

1 Answers1

2

Yes, Sparkplug is a "baseline" or non-optimizing compiler.

The V8 architecture is changing all the time, and I'm sure this answer will soon be out of date. I believe the addition of a non-optimizing compiler is a recent development, so this might be why you have heard V8 didn't have one. But this is my understanding of the current architecture.

The V8 engine has multiple ways to execute code. It has an interpreter, a baseline compiler (Sparkplug), and an optimizing compiler. The reason for the multiple strategies is a tradeoff between how long it takes to compile code and how long it takes to execute code. A slower compiler can generate more optimized code. But if a block of code is only executed once or a few times, an optimizing compiler might actually be a lot slower than an interpreter. So the engine shifts strategy based on how many times the code is executed.

But in all cases, the source code is first compiled into bytecode. The interpreter and the compilers all use this bytecode as input.

So the flow is more correctly illustrated like:

source code -> parser -> AST -> bytecode -|-> interpreter
                                          |-> baseline compiler -> machine code
                                          |-> optimizing compiler -> machine code

The term "compiler" is used both for the part transforming the source code into bytecode, and the parts transforming bytecode into machine code.

Bytecode is an intermediate format between source code and machine code. It is much lower level than source code, but not CPU specific like machine code. Bytecode is used because parsing source code is expensive, so there is no reason to do it more than once.

The interpreter does not compile the bytecode to machine code. It reads the bytecode instructions one at a time and executes the corresponding code. This has some overhead in reading and decoding each bytecode instruction, but the advantage is there is no separate compilation pass necessary, so it will be extremely fast to get started.

JacquesB
  • 41,662
  • 13
  • 71
  • 86
  • IIRC, there was (at some point in the past of the changing architecture of V8) a compiler or interpreter that did *not* use bytecode as input but did start fresh from the AST – Bergi Jun 05 '23 at 13:00
  • thanks for reply. > The interpreter does not compile the bytecode to machine code. It reads the bytecode commands one by one and executes the corresponding code. Is the corresponding code machine code in the part you are talking about? also as I understand bytecode I have 3 execution patterns that take bytecode as input? If the bytecode interpreter is executing the corresponding code, why is there a baseline compiler? I can't make sense of it. @JacquesB; @bergi; – emreakdas Jun 05 '23 at 19:43
  • @emreakdas See the third paragraph ("*The V8 engine has multiple…*") of the answer. Having multiple ways to execute available does not mean that it uses all of them at once. – Bergi Jun 05 '23 at 19:56
  • @Bergi Is the corresponding code in the mentioned part machine code? – emreakdas Jun 05 '23 at 20:13
  • @emreakdas It is the code of the interpreter. It may be machine code, it may be byte code, it may be javascript code, it doesn't matter. – Bergi Jun 05 '23 at 20:16
  • @Bergi I wrote an answer above, can you take a look? – emreakdas Jun 05 '23 at 20:39
  • @emreakdas: Yes, the interpreter is a program which itself is compiled to machine code, so the executed code is always machine code. – JacquesB Jun 06 '23 at 06:08
  • In that source, they describe the process as follows: https://mathiasbynens.be/notes/shapes-ics (source code -> parser -> ast -> interpreter -> bytecode) They mention that bytecode is executed, and in some sources, they also mention that the interpreter generates bytecode. Which one is correct? In a previous response, you mentioned that the compiler generates bytecode, and the interpreter takes it as input. Everyone seems to say different things about these topics. Could you clarify, along with separate answers? Thank you in advance :) – emreakdas Jun 10 '23 at 23:21
  • @emreakdas: The interpreter does not generate bytecode. I think the linked article is just simplifying too much. Bytecode generation is a distinct step happening before the interpreter executes the bytecode. It would be highly inefficient to have an interpreter generate bytecode, since you only need to generate bytecode once for a function, but the interpreter might execute the function multiple times. – JacquesB Jun 11 '23 at 16:03
  • @JacquesB So, is JIT compilation performed when executing the generated bytecode? As far as I understand, after the bytecode is generated, the interpreter interprets the code and executes the corresponding instructions. This interpreter is a program itself, and it gets compiled into machine code. You also mentioned three execution models. What are the differences between an interpreter and a baseline compiler? Both of them don't seem to optimize. What criteria determine when each of them is used? – emreakdas Jun 11 '23 at 17:01
  • 1
    "I'm sure this answer will soon be out of date" --> V8 now has 2 optimizing compilers (Maglev and Turbofan), so you were right, out of date pretty quickly :D (this doesn't invalidate the answer though) – Dada Jun 27 '23 at 14:21