-2

I have a text file with the following structure:

#DATA1 1000
#DATA2 1000
#DATA3 2000

#VER B 2 20190403 "Text" 20190413
{
#TRANS 3001 {1 "TEXT"} -14000 "" "" 0
#TRANS 2611 {1 "TEXT"} -3500 "" "" 0
#TRANS 1510 {1 "LIU"} 17500 "" "" 0
}
#VER C 1 20190426 "TEXT" 20190426
{
#TRANS 1930 {} 1875 "" "" 0
#TRANS 1510 {} -1875 "" "" 0
}

I am trying to find a way to:

  1. Segment the text file in segments from each line starting with #VER until the line before next line starting with #VER
  2. And from there carry out other code on each text line in the segment (not part of this question)

Any suggestions how to start me off? Been testing with this fiddle but no success so far.

https://jsfiddle.net/236pbzqf/2/

Kevin Lindmark
  • 1,155
  • 3
  • 13
  • 26
  • 1
    Use the available [`String`](//developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String#instance_methods) and [`RegExp`](//developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/RegExp#instance_methods) methods. See [Reference - What does this regex mean?](/q/22937618/4642212) and the [regex tag wiki](/tags/regex/info) and use regex debuggers like [RegEx101](//regex101.com/). – Sebastian Simon Oct 29 '21 at 11:59
  • 1
    The static and instance methods of [`Object`](//developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object#Static_methods) and [`Array`](//developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array#Static_methods) may also help. – Sebastian Simon Oct 29 '21 at 12:00
  • Are you using NodeJs? you can use `fs` https://nodejs.org/api/fs.html#fsreadfilepath-options-callback .. googling "read file in nodejs" will give you other tutorials – Katherine R Oct 29 '21 at 12:02

2 Answers2

1

Basic parsing. I would match the lines that start with #. You can easily just loop over every line and ignore the ones with { or } Or if the braces really matter, than you will need to loop over every line.

But assuming that the { and } are not really needed, you can do something like this.

var txt = `#DATA1 1000
#DATA2 1000
#DATA3 2000

#VER B 2 20190403 "Text" 20190413
{
#TRANS 3001 {1 "TEXT"} -14000 "" "" 0
#TRANS 2611 {1 "TEXT"} -3500 "" "" 0
#TRANS 1510 {1 "LIU"} 17500 "" "" 0
}
#VER C 1 20190426 "TEXT" 20190426
{
#TRANS 1930 {} 1875 "" "" 0
#TRANS 1510 {} -1875 "" "" 0
}
`;

// parse out the commands
const commands = txt.match(/(#[^\n]+)/g)


// loop over
const results = commands.reduce((acc, command) => {
  // break it up into its parts
  const [x, type, params] = command.match(/#([^\s]+)\s(.*)/)
  // if we find a ver, add new object to push to
  // if we find trans, push to the last object
  // else, assume it is data fields
  if (type === "VER") {
    acc.vers.push({
      data: params,
      trans: []
    });
  } else if (type === "TRANS") {
    acc.vers[acc.vers.length - 1].trans.push(params);
  } else {
    acc.data[type] = params;
  }
  return acc;
}, {
  data: {},
  vers: []
});

console.log(results);
epascarello
  • 204,599
  • 20
  • 195
  • 236
0

Taking a step back, it looks like you're trying to write a code interpreter. The basic steps of doing this are:

  • Convert the code into a sequence of tokens (i.e., lexical analysis with a "lexer")
  • Consume the tokens to convert into some kind of structured format so that it can be executed (e.g., binary syntax tree, using a parser)

You can write these yourself, but you may want to explore standard code interpretation engines as they can do a lot of the hard work for you. The learning curve might be a bit steep, though.

For a simple language, you might be able to do something a bit less formal. A quick glance at your example seems to show that the code is broken down into lines, so newline characters are important. It also looks like important keywords are rather helpfully prefixed with the # symbol. Given the above, I'd probably start by doing something like the below:

const data = `
#DATA1 1000
#DATA2 1000
#DATA3 2000

#VER B 2 20190403 "Text" 20190413
{
#TRANS 3001 {1 "TEXT"} -14000 "" "" 0
#TRANS 2611 {1 "TEXT"} -3500 "" "" 0
#TRANS 1510 {1 "LIU"} 17500 "" "" 0
}
#VER C 1 20190426 "TEXT" 20190426
{
#TRANS 1930 {} 1875 "" "" 0
#TRANS 1510 {} -1875 "" "" 0
}
`

// Get the data as individual lines
let dataLines = data.split("\n")

// Remove empty lines
dataLines = dataLines.filter(line => line !== "")

// Convert to tokens
tokenisedData = dataLines.map(line => {
  let tokenName = "UNKNOWN";
  if (line.match(/^#VER .+/)) {
    tokenName = "VER_TOKEN"
  } else if (line.match(/^#TRANS .+/)) {
    tokenName = "TRANS_TOKEN"
  } else if (line.match(/^#(DATA1|DATA2|DATA3) .+/)) {
    tokenName = "DATA_TOKEN"
  } else if (line === "{") {
    tokenName = "OPEN_BLOCK"
  } else if (line === "}") {
    tokenName = "CLOSE_BLOCK"
  }
  return {
    token: tokenName,
    rawText: line
  }
})

// Contexual parsing based on known token sequences may begin
const parsedData = [];

while (tokenisedData.length > 0) {
  // Consume the first token
  currentToken = tokenisedData.shift();
  switch (currentToken.token) {

    // Convert known sequence VER_TOKEN, OPEN_BLOCK, <<nested commands>> , CLOSE_BLOCK
    case "VER_TOKEN":
      // Set up an object to contain the VER command and the nested block
      const verCommand = {
        token: "VER_COMMAND",
        // TODO - presumably need to parse the rawText here and populate in this verCommand object
        rawText: currentToken.rawText,
        nestedCommands: []
      }
      // We now expect an OPEN_BLOCK. Throw if not.
      let nextToken = tokenisedData.shift();
      if (nextToken.token !== "OPEN_BLOCK") {
        throw "Parse error: expected OPEN_BLOCK for VER command but instead got " + nextToken.token
      }
      nextToken = tokenisedData.shift();
      // Add the nested commands into the VER nestedCommands array
      while (nextToken && nextToken.token !== "CLOSE_BLOCK") {
        verCommand.nestedCommands.push(nextToken)
        // Get the next token
        nextToken = tokenisedData.shift();
      }
      // We now must have a CLOSE_BLOCK token
      if (nextToken.token !== "CLOSE_BLOCK") {
        throw "Parse error: expected CLOSE_BLOCK for VER command but instead got " + nextToken.token
      }
      // Add the parsed VER command to the resulting parsed data
      parsedData.push(verCommand);
      break;

    // Nothing special to do with this token - keep it as it is
    default:
      parsedData.push(currentToken);
      break;
  }
}

console.log(parsedData)

That example grew a bit more than I initially planned. :)

However it could very well be a pretty reasonable starting point for a basic lexer and parser for the language you're interpreting.

Using the above, it converts the text file into the following structured format:

[
   {
      "rawText":"#DATA1 1000",
      "token":"DATA_TOKEN"
   },
   {
      "rawText":"#DATA2 1000",
      "token":"DATA_TOKEN"
   },
   {
      "rawText":"#DATA3 2000",
      "token":"DATA_TOKEN"
   },
   {
      "nestedCommands":[
         {
            "rawText":"#TRANS 3001 {1 \\&quot;TEXT\\&quot;} -14000 \\&quot;\\&quot; \\&quot;\\&quot; 0",
            "token":"TRANS_TOKEN"
         },
         {
            "rawText":"#TRANS 2611 {1 \\&quot;TEXT\\&quot;} -3500 \\&quot;\\&quot; \\&quot;\\&quot; 0",
            "token":"TRANS_TOKEN"
         },
         {
            "rawText":"#TRANS 1510 {1 \\&quot;LIU\\&quot;} 17500 \\&quot;\\&quot; \\&quot;\\&quot; 0",
            "token":"TRANS_TOKEN"
         }
      ],
      "rawText":"#VER B 2 20190403 \\&quot;Text\\&quot; 20190413",
      "token":"VER_COMMAND"
   },
   {
      "nestedCommands":[
         {
            "rawText":"#TRANS 1930 {} 1875 \\&quot;\\&quot; \\&quot;\\&quot; 0",
            "token":"TRANS_TOKEN"
         },
         {
            "rawText":"#TRANS 1510 {} -1875 \\&quot;\\&quot; \\&quot;\\&quot; 0",
            "token":"TRANS_TOKEN"
         }
      ],
      "rawText":"#VER C 1 20190426 \\&quot;TEXT\\&quot; 20190426",
      "token":"VER"
   }
]

Notably the VER command, along with the open and close brackets and all nested commands, have all been consumed and are now contained into a single VER_COMMAND object.

This type of formal structuring of the data makes it much easier to handle in the code, as you can now just iterate over the program and execute the parts of program that you want.

William Forty
  • 212
  • 1
  • 6