I have a collection of files written in a language 'A' that need to be translated into corresponding files of a language 'B'. I want to create a program/parser that can automate this task (probably rather a toolchain than a single program). However, I am struggling to find a suitable choice for the programs of my toolchain.
Language A is embedded software code, i.e. low-level language. It is 90% standard C-Code and 10% "custom" Code, i.e. the files also contain small segments that cannot be understood by a standard C compiler. The 90% C-Code is not any random C-construct that is possible in C (this would be hard to parse concerning semantics), but follows certain recurring expressions, actions and patterns. And it always follows these patterns in the (more or less) same way. It does mostly perform write operations to the memory, and does not include complex structures such as C-struct or enum etc..
Example of regular low-level C-Code in language A:
#define MYMACRO 0x123
uint32_t regAddr;
regAddr = MYMACRO;
*(uint32_t*)(regAddr) = 0xdeadbeef;
Example for "custom code" in language A:
custom_printf("hello world! Cpu number: %d \n", cpu_nr);
Language B is a 100% custom language. This transformation is necessary, in order to work with the file in another tool for debugging purposes. Translation of the example above would look roughly like this:
definemacro MYMACRO 0x123
define_local_int regAddr
localint.set regAddr = MYMACRO
data.write regAddr 0xdeadbeef
Note: I am well aware that Stackoverflow is not meant to be a site for open discussions about "which tool do you prefer?". But I think this question is rather one like "I need at least ONE meaningful toolset that gets the job done", i.e. there are probably not so many sensible options for discussion anyway.
These were my considerations and approaches so far:
- Performance is NOT relevant for my toolchain. It only should be easy to implement and adapt to changes.
- First approach: As language A is mostly C-Code, I first thought of the pycparser Python Plugin, which provides a C-parser that parses C-Code into an AST (Abstract Syntax Tree). My plan was to read in the language-A files, and then write a Python program that creates language-B files out of the AST. However, I found it difficult to adapt/teach the pycparser plugin in order to fully support the 10% custom properties of language A.
- Second approach: Using 'general-purpose parser generators' such as Yacc/Bison or ANTLR. Here however, I am not sure which of the tools suits my needs (Yacc/Bison with LALR parser or ANTLR with LL parser) and how to set up an appropriate toolchain that includes such a parser and then processes (e.g. with Python) the data structure that the generated parser creates in order to create the custom language B. It would also be helpful if the parser generator of choice provides an existing C-language definition that can easily adapted for the 10% custom C-language part. I should also mention that I have never worked with general-purpose parsers before.
Could anybody please give me some advice about a meaningful set of tools for this task?
Edit: I apologize if this seems as a vague question, I tried to put it as precisely as I could. I added an example for languages A and B to make the composition of the languages more clear, and in order to show that language A follows certain recurring patterns that can be easily understood concerning semantics.
If this edit does not improve the clarity and broadness, I will repost on programmers as was suggested.
Edit2: Alright, as the topic clearly still seems to be deplaced here, I herewith withdraw the question. I already received some valuable input from the first few posters, which enouraged me to make further experiments with the general purpose parser generators.