Taking a step back, it looks like you're trying to write a code interpreter. The basic steps of doing this are:
- Convert the code into a sequence of tokens (i.e., lexical analysis with a "lexer")
- Consume the tokens to convert into some kind of structured format so that it can be executed (e.g., binary syntax tree, using a parser)
You can write these yourself, but you may want to explore standard code interpretation engines as they can do a lot of the hard work for you. The learning curve might be a bit steep, though.
For a simple language, you might be able to do something a bit less formal. A quick glance at your example seems to show that the code is broken down into lines, so newline characters are important. It also looks like important keywords are rather helpfully prefixed with the # symbol. Given the above, I'd probably start by doing something like the below:
const data = `
#DATA1 1000
#DATA2 1000
#DATA3 2000
#VER B 2 20190403 "Text" 20190413
{
#TRANS 3001 {1 "TEXT"} -14000 "" "" 0
#TRANS 2611 {1 "TEXT"} -3500 "" "" 0
#TRANS 1510 {1 "LIU"} 17500 "" "" 0
}
#VER C 1 20190426 "TEXT" 20190426
{
#TRANS 1930 {} 1875 "" "" 0
#TRANS 1510 {} -1875 "" "" 0
}
`
// Get the data as individual lines
let dataLines = data.split("\n")
// Remove empty lines
dataLines = dataLines.filter(line => line !== "")
// Convert to tokens
tokenisedData = dataLines.map(line => {
let tokenName = "UNKNOWN";
if (line.match(/^#VER .+/)) {
tokenName = "VER_TOKEN"
} else if (line.match(/^#TRANS .+/)) {
tokenName = "TRANS_TOKEN"
} else if (line.match(/^#(DATA1|DATA2|DATA3) .+/)) {
tokenName = "DATA_TOKEN"
} else if (line === "{") {
tokenName = "OPEN_BLOCK"
} else if (line === "}") {
tokenName = "CLOSE_BLOCK"
}
return {
token: tokenName,
rawText: line
}
})
// Contexual parsing based on known token sequences may begin
const parsedData = [];
while (tokenisedData.length > 0) {
// Consume the first token
currentToken = tokenisedData.shift();
switch (currentToken.token) {
// Convert known sequence VER_TOKEN, OPEN_BLOCK, <<nested commands>> , CLOSE_BLOCK
case "VER_TOKEN":
// Set up an object to contain the VER command and the nested block
const verCommand = {
token: "VER_COMMAND",
// TODO - presumably need to parse the rawText here and populate in this verCommand object
rawText: currentToken.rawText,
nestedCommands: []
}
// We now expect an OPEN_BLOCK. Throw if not.
let nextToken = tokenisedData.shift();
if (nextToken.token !== "OPEN_BLOCK") {
throw "Parse error: expected OPEN_BLOCK for VER command but instead got " + nextToken.token
}
nextToken = tokenisedData.shift();
// Add the nested commands into the VER nestedCommands array
while (nextToken && nextToken.token !== "CLOSE_BLOCK") {
verCommand.nestedCommands.push(nextToken)
// Get the next token
nextToken = tokenisedData.shift();
}
// We now must have a CLOSE_BLOCK token
if (nextToken.token !== "CLOSE_BLOCK") {
throw "Parse error: expected CLOSE_BLOCK for VER command but instead got " + nextToken.token
}
// Add the parsed VER command to the resulting parsed data
parsedData.push(verCommand);
break;
// Nothing special to do with this token - keep it as it is
default:
parsedData.push(currentToken);
break;
}
}
console.log(parsedData)
That example grew a bit more than I initially planned. :)
However it could very well be a pretty reasonable starting point for a basic lexer and parser for the language you're interpreting.
Using the above, it converts the text file into the following structured format:
[
{
"rawText":"#DATA1 1000",
"token":"DATA_TOKEN"
},
{
"rawText":"#DATA2 1000",
"token":"DATA_TOKEN"
},
{
"rawText":"#DATA3 2000",
"token":"DATA_TOKEN"
},
{
"nestedCommands":[
{
"rawText":"#TRANS 3001 {1 \\"TEXT\\"} -14000 \\"\\" \\"\\" 0",
"token":"TRANS_TOKEN"
},
{
"rawText":"#TRANS 2611 {1 \\"TEXT\\"} -3500 \\"\\" \\"\\" 0",
"token":"TRANS_TOKEN"
},
{
"rawText":"#TRANS 1510 {1 \\"LIU\\"} 17500 \\"\\" \\"\\" 0",
"token":"TRANS_TOKEN"
}
],
"rawText":"#VER B 2 20190403 \\"Text\\" 20190413",
"token":"VER_COMMAND"
},
{
"nestedCommands":[
{
"rawText":"#TRANS 1930 {} 1875 \\"\\" \\"\\" 0",
"token":"TRANS_TOKEN"
},
{
"rawText":"#TRANS 1510 {} -1875 \\"\\" \\"\\" 0",
"token":"TRANS_TOKEN"
}
],
"rawText":"#VER C 1 20190426 \\"TEXT\\" 20190426",
"token":"VER"
}
]
Notably the VER command, along with the open and close brackets and all nested commands, have all been consumed and are now contained into a single VER_COMMAND object.
This type of formal structuring of the data makes it much easier to handle in the code, as you can now just iterate over the program and execute the parts of program that you want.