2

I want to apply various operations to data files : algebra of sets, statistics, reporting, changes. But the format of the files is far from code examples and a bit weird. There are differents sorts of items, items type, and some of them are put together as a collection. There is a simplistic example below.
I'm new in boost::spirit and I have tried coding to split the items and get basic informations (name, version, date) required for most of treatments. Eventually it seems tricky for me. Is the problem my lack of skills or boost::spirit is not suitable to this format?
Studying boost::spirit is not a waste of time, I am sure to use it later. But I didn't find examples of code like mine, I may not go the right way.

>>>process_type_A
//name(typeA_1)
//version(A.1.99)
//date(2016.01.01)
//property1 "pA11"
//property2 "pA12"
//etc_A_1 (thousand of lines - a lot are "multiline" and/or mulitline sub-records)
<<<process_type_A
>>>process_type_A
//name(typeA_2)
//version(A.2.99)
//date(2016.01.02)
//property1 "pA21"
//property2 "pA22"
//etc_A_2 (hundred or thousand of lines)
<<<process_type_A
>>>process_type_B
//name(typeB_1)
//version(B.1.99)
//date(2016.02.01)
//property1 "pB11"
//property2 "pB12"
//etc_B_1 (hundred or thousand of lines)
<<<process_type_B
>>>paramset_type_C
//>>paramlist
////name(typeC_1)
////version(C.1.99)
////date(2016.03.01)
////property1 "pC11"
////property2 "pC12"
////etc_C_1 (hundred or thousand of lines)
//<<paramlist
//>>paramlist
////name(typeC_2)
////version(C.2.99)
////date(2016.04.01)
////property1 "pC21"
////property2 "pC22"
////etc_C_2 (hundred or thousand of lines)
//<<paramlist
<<<paramset_type_C

Code::Blocks
Boost 1.60.0
GCC Compiler on Windows and Linux

Steven M.
  • 23
  • 3

2 Answers2

2

I think @Orient is right: regex w/captures is enough here.

However, Spirit has the upside of coming without a linker dependency. Here's some approaches (using seek[] and raw[]) for inspiration:

Note that Spirit X3 (still experimental) also has a seek[] directive and it will compiler much faster.

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • @Orient and your advice sounds wise. I will study those links. I didn't find them when I looked for information on stackoverflow. Usual problem of key word choice when searching. – Steven M. Jan 23 '16 at 17:09
1

The main advice I would give about Qi is that it is a very powerful and flexible tool for parsing. You can define quite complicated, possibly recursive structures, using boost::variant, boost::optional, etc., and associate these types with qi rules and it seemingly magically does the right thing, giving you a nice AST for your data.

The biggest sources of difficulty in my (limited) experience are when you try to make it do more than that and also process the data. It's sometimes tempting to try to "eagerly" do some processing at the same time that you are parsing the data, often in a semantic action or something. Don't do it! It usually makes things harder to read in the end, a bit harder to debug, and sometimes you can be surprised what will happen if the grammar has to backtrack across your semantic action which it already executed.

qi should work great if you can write a nice grammar for your data. If you can't write an unambiguous grammar, you might be able to use qi::eps to make it parseable but you don't want to have to do that too often IMO. I don't think "hundreds or thousands" of items will pose any particular problem.

Right now the question is rather opinion-oriented -- if you can post a more complete description of the data format you have, or better, a complete code example which is failing, it might make it easier to give precise answers.

Chris Beck
  • 15,614
  • 4
  • 51
  • 87
  • "Eagerly": you're right, it's a bad habit for coders.
    Anyway Spirit will be useful to check/parse/scan some data subsets from those file type.
    – Steven M. Jan 23 '16 at 17:19