Converting "c-like language" to "custom language" with parser

Question

I have a collection of files written in a language 'A' that need to be translated into corresponding files of a language 'B'. I want to create a program/parser that can automate this task (probably rather a toolchain than a single program). However, I am struggling to find a suitable choice for the programs of my toolchain.

Illustration of desired toolchain

Language A is embedded software code, i.e. low-level language. It is 90% standard C-Code and 10% "custom" Code, i.e. the files also contain small segments that cannot be understood by a standard C compiler. The 90% C-Code is not any random C-construct that is possible in C (this would be hard to parse concerning semantics), but follows certain recurring expressions, actions and patterns. And it always follows these patterns in the (more or less) same way. It does mostly perform write operations to the memory, and does not include complex structures such as C-struct or enum etc..

Example of regular low-level C-Code in language A:

#define MYMACRO 0x123
uint32_t regAddr;
regAddr = MYMACRO;
*(uint32_t*)(regAddr) = 0xdeadbeef;

Example for "custom code" in language A:

custom_printf("hello world! Cpu number: %d \n", cpu_nr);

Language B is a 100% custom language. This transformation is necessary, in order to work with the file in another tool for debugging purposes. Translation of the example above would look roughly like this:

definemacro MYMACRO 0x123
define_local_int regAddr
localint.set regAddr = MYMACRO
data.write regAddr 0xdeadbeef

Note: I am well aware that Stackoverflow is not meant to be a site for open discussions about "which tool do you prefer?". But I think this question is rather one like "I need at least ONE meaningful toolset that gets the job done", i.e. there are probably not so many sensible options for discussion anyway.

These were my considerations and approaches so far:

Performance is NOT relevant for my toolchain. It only should be easy to implement and adapt to changes.
First approach: As language A is mostly C-Code, I first thought of the pycparser Python Plugin, which provides a C-parser that parses C-Code into an AST (Abstract Syntax Tree). My plan was to read in the language-A files, and then write a Python program that creates language-B files out of the AST. However, I found it difficult to adapt/teach the pycparser plugin in order to fully support the 10% custom properties of language A.
Second approach: Using 'general-purpose parser generators' such as Yacc/Bison or ANTLR. Here however, I am not sure which of the tools suits my needs (Yacc/Bison with LALR parser or ANTLR with LL parser) and how to set up an appropriate toolchain that includes such a parser and then processes (e.g. with Python) the data structure that the generated parser creates in order to create the custom language B. It would also be helpful if the parser generator of choice provides an existing C-language definition that can easily adapted for the 10% custom C-language part. I should also mention that I have never worked with general-purpose parsers before.

Could anybody please give me some advice about a meaningful set of tools for this task?

Edit: I apologize if this seems as a vague question, I tried to put it as precisely as I could. I added an example for languages A and B to make the composition of the languages more clear, and in order to show that language A follows certain recurring patterns that can be easily understood concerning semantics.

If this edit does not improve the clarity and broadness, I will repost on programmers as was suggested.

Edit2: Alright, as the topic clearly still seems to be deplaced here, I herewith withdraw the question. I already received some valuable input from the first few posters, which enouraged me to make further experiments with the general purpose parser generators.

Most parser-generators have example C grammars available, often several of them, some just include the grammar file while others create trees or go even further, most of the grammars are simple to modify though (if you have some experience with the tool). The problem I see here is you say "90% *standard* C", but you don't mention *which* standard. And there's really no way we can say which tool or set of tools will fit you and your project best, you really have to so some experimenting yourself to find the "right" tool. — Some programmer dude, Nov 16 '15 at 09:24
C definitely is a hard language to transform as is has a very loose language (pointers/arrays and data types and expressions/statements). That means using a full C grammar and doing more complex transformations. I vaguely remember a project trying to generate Java from C as a programmer would write. That would need some preprocessing of your non-C code. I could not give sensible advice too. "Depends" — Joop Eggen, Nov 16 '15 at 09:32
This question should be better asked on [programmers](http://programmers.stackexchange.com/), and you really should explain what is your DSL doing, ie. the 10% custom, what are they for? Give some code example could be helpful. Interesting question, but too broad and off-topic here. Notice also that [semantics](https://en.wikipedia.org/wiki/Semantics_%28computer_science%29) matters much more than syntax — Basile Starynkevitch, Nov 16 '15 at 09:39
If the language A is fed into some translator (emitting C99 code) which you cannot improve, you might consider processing that emitted C code with [GCC MELT](http://gcc-melt.org/) if you use a recent [GCC](http://gcc.gnu.org/) compiler — Basile Starynkevitch, Nov 16 '15 at 09:43
C is a small language, easy to parse, but unfortunately that makes figuring the semantics and the intent behind that much harder. Why would you even want to do this? People usually convert other languages to C for performance. Turning low level C with various macro hacks into another language and its idioms automatically seems like a futile effort. — dtech, Nov 16 '15 at 09:46
*"There either too many possible answers, or good answers would be too long for this format."* As a reason to close this question, this is simply nonsense. — Ira Baxter, Nov 16 '15 at 09:59
@IraBaxter - my call was "off-topic", since it asks for a software recommendation. — dtech, Nov 16 '15 at 10:06
@ddriver: That may be, but collectively you and your fellow closers did not provide constructive advice to OP as to how to handle this. — Ira Baxter, Nov 16 '15 at 10:08
@IraBaxter - I don't make the rules. Rules say this is not the place for such questions. The OP should repost it were it belongs, where it has better chances of getting attention and answers. And at any rate, you managed to sneak your answer in, so it is all good ;) — dtech, Nov 16 '15 at 10:14
@ddriver: You *do* make the rules, partly by enforcing them. And my objection to the close reason is that it does not tell the OP how to constructively ask the question. (Even with the "objection" that this is a software recommendation question, there usually isn't any hint that there is specifically a SO place to ask such questions, e.g., SR). — Ira Baxter, Nov 16 '15 at 10:17
@IraBaxter When a question is put on hold, such hints to the OP are generated automatically. There is no need to manually post that for every single post close-voted. This question is both too broad and a tool recommendation, as it happened "too broad" close reasons won, and therefore the OP was given the message "There are either too many possible answers, or good answers would be too long for this format. ..." and so on. I don't think there is a way to recover this question to make it suitable for SO. — Lundin, Nov 16 '15 at 10:22
@IraBaxter - contrary to how it may look, I consider library recommendations a useful thing. I have no idea why those are no longer welcome, since for a lot of time it was OK. — dtech, Nov 16 '15 at 10:23
@Lundin: *read* the hold message. It does not convey the objection you claim is intended. If I were new and read this, I'd go away in disgust. Nor does it say anything (nor does the provided link) about where one can post such questions. — Ira Baxter, Nov 16 '15 at 10:26
@IraBaxter - but since library recommendations are no longer in the scope of SO, when I make such I do it in the form of a comment, since as I already mentioned, an answer to an off-topic question is off-topic on its own. A comment can be as helpful as an answer in this regard. — dtech, Nov 16 '15 at 10:32
@IraBaxter If think the message should be changed for whatever reason, go ahead and raise a meta thread about it. And I don't believe there exists any Stack Exchange site where you could ask very broad tool recommendation questions, so how could there be a link? Tool recommendations are [off-topic](http://programmers.stackexchange.com/help/on-topic) on the Programmers site so the question can't be dumped there either. — Lundin, Nov 16 '15 at 10:38
@Lundin: "Dumped"? That seems pretty negative; OP has a legitimate question. It seems like a lot of folks (especially closers) don't know about SR: http://softwarerecs.stackexchange.com/ I think its very existence contradicts SO's stance that software recommendations are "off topic" (not OK at SO, but OK at another web page at SE). But I don't make the rules here either. — Ira Baxter, Nov 16 '15 at 10:43
@IraBaxter Well there you go then. Raise a meta thread about how the "off-topic because of tool recommendation" should be updated to point at that site. — Lundin, Nov 16 '15 at 10:45
@IraBaxter: Its existence doesn't "contradict" off-topicness of software recommendations on SO. Its existence reinforces it. Software recommendation questions go there, not here. That is very clear in the Help Centre, no matter how "not very receptive" you think it is that you cannot use such questions here to advertise your product. — Lightness Races in Orbit, Nov 16 '15 at 10:47
@LightnessRacesinOrbit: If you wrote that text, would it be an ad? Is it irrelevant to the question [wonder what OP thinks]? Does it contain false or misleading information? Why is it an ad if I write it? It follows the SO approved rules for labeling by the accepted phrase "Our". — Ira Baxter, Nov 16 '15 at 10:58
Your `custom_printf` is *not* a "new" language. It is simply a new function from your library. Then your whole question might be "how to translate your C code using a custom library into another custom language B which you did not define or characterize at all (is B some dialect of Lua?) — Basile Starynkevitch, Nov 16 '15 at 11:19
@IraBaxter: I'm not getting into this again. I recall it being well covered with you on meta. All that matters here today is that this question is off-topic; end of story. — Lightness Races in Orbit, Nov 16 '15 at 11:32
@ddriver: I disagree, mostly because writing an "answer" in comments then voting to close the question only encourages posting more off-topic questions. It's nice that you want to help but I think you're sending the wrong signals in your attempt to do so! :) — Lightness Races in Orbit, Nov 16 '15 at 11:33
When C++ first arrived, a compile of .cpp code consisted of a step of a C++ pre-processor converting the .cpp code into .c and then compiling the .c file. Errors/warnings referenced the .c file. Your .A files could like-wise fit into the standard make process by replacing the pre-processor with your "new" pre-processor. — chux - Reinstate Monica, Nov 16 '15 at 13:10
@LightnessRacesinOrbit - still on it? Thanks anyway, knowing Patrick Hofman's opinion on the subject makes all the difference. — dtech, Dec 04 '15 at 14:54
@ddriver: I came across it and decided that in my free time I would try to demonstrate to you the overwhelming community consensus. Oh well. Nice talking with you. — Lightness Races in Orbit, Dec 04 '15 at 15:03

Converting "c-like language" to "custom language" with parser

0 Answers0