0

Recently I've been busy with some PHP framework - completely off-topic by the way.

Anyhow, I got specific html/template files I would like to parse with C++ (don't ask me why, it's just because I want to write it in C++). Besides that, it might actually be the first useful thing I would ever write in C++.

Anyway, to get back to the problem, imagine I have a file like the following:

<table>
    <tr>
        <th>ID</th>
        <th>Title</th>
        <th>Actions</th>
    </tr>
    {foreach from="$pages => $page"}
    <tr>
        <td>{$page.Id()}</td>
        <td>{$page.Title()}</td>
        <td><a href="page/edit/{$page.Id()}/">Edit</a> | <a href="page/delete/{$page.Id()}/">Delete</a></td>
    </tr>
    {foreachelse}
    <tr>
        <td colspan="3">There are no pages to be displayed</td>
    </tr>
    {/foreach}
</table>

And the output should be:

<table>
    <tr>
        <th>ID</th>
        <th>Title</th>
        <th>Actions</th>
    </tr>
    <?php if(count($pages) > 0): ?>
    <?php foreach($pages as $page): ?>
    <tr>
        <td><?php echo $page->getId(); ?></td>
        <td><?php echo $page->getTitle(); ?></td>
        <td><a href="page/edit/<?php echo $page->getId(); ?>/">Edit</a> | <a href="page/delete/<?php echo $page->getId(); ?>/">Delete</a></td>
    </tr>
    <?php endforeach; ?>
    <?php else: ?>
    <tr>
        <td colspan="3">There are no pages to be displayed</td>
    </tr>
    <?php endif; ?>
</table>

Why I am doing this might not be exactly clear to you, but it remains a problem, applicable somewhere else in any case.

Anyhow, some forward and backward lookups and modifications in the output files are required. What is the right approach to this problem?

Machiel
  • 1,495
  • 12
  • 15
  • Why not just write the site in plain PHP? – Puppy Dec 08 '10 at 20:34
  • 1
    That was not what I was going for ;). It's because I want to write C++. You see, PHP is getting dull, C++ is quite a bit harder, and writing a parser like this is harder, than just writing it in plain PHP. – Machiel Dec 08 '10 at 20:42
  • 1
    I always admire somebody who enjoys a good challenge ;). – andand Dec 08 '10 at 20:51

3 Answers3

2

You can write a handcrafted parser, which might be nontrivial, depending on your actual requirements. Your next best bet is to use BNF-like C++ parsers, e.g. boost::spirit, so you don't need to sweat processing parsing rules yourself. You will still need to write correct semantic actions to convert { ... } to php.

Gene Bushuyev
  • 5,512
  • 20
  • 19
  • I will look into the Boost Spirit library, it looks promising. I still was wondering however, if I go for the handcrafted parser. What would be a good way to solve this, through a Queue, or a Stack for example? – Machiel Dec 08 '10 at 21:06
1

The right approach, in my view, would not to re-invent the wheel (i.e. writing your own parser) but rather an existing library that will make it easier and less time consuming for you. One of those C++ libraries could be wxHTMLParser or wxHTML.

Android Eve
  • 14,864
  • 26
  • 71
  • 96
0

For these type of problems I tend to be inclined towards REGEX. Using either boost::regex or the GNU regex classes or any other library. Identifying those markers and converting them is mostly a regex search and replace thing (with parameters for variable names, values, etc.), and you don't have to write code to actually parse the complete HTML and the special inserts.

Diego Sevilla
  • 28,636
  • 4
  • 59
  • 87