1

I have some data to feed to a C/C++ program and I could easily convert it in CSV format. However I would need a couple of extensions to the CSV standard, or the parts I know about it.

Data are heterogeneous, there are different parameters of different sizes. They could be 1-valued, vectors or multidimensional arrays. My ideal format would be like this one

--+ Size1
2
--+ Size2
4
--+Table1
1;2;3;4
5;6;7;8
--+Table2
1;2

"--+" is some sort of separator. I have two 1-valued parameters named symbolically Size1 and Size2 and two other multidimensional parameters Table1 and Table2. In this case the dimensions of Table1 and Table2 are given by the other two parameters. Also rows and columns could be named, i.e. there could be a table like

--+Table3
A;B
X;1;2
Y;4;5

Where element ("A","X") is 1 and ("B","X") is 2 and so forth.

In other terms it's like a series of appended CSV files with names for tables, rows and columns.

The parsers should be able to exploit the structure of the file allowing me to write code like this:

parse(my_parser,"Size1",&foo->S1); // read Size1 value and write it in &foo.S1
parse(my_parser,"Size2",&foo->S2); // read Size2 value and write it in &foo.S2
foo->T2=malloc(sizeof(int)*(foo->S1)); 
parse(my_parser,"Table2",foo->T2); // read Table2

If it was able to store rows and columns name it would be a bonus.

I don't think it would take much to write such a library, but I have more important things to do ATM.

Is there an already defined format like this one? With open-source libraries for C++? Do you have other suggestions for my problem?

Thanks in advance.

A.

user881430
  • 195
  • 1
  • 1
  • 11

3 Answers3

1

I would use JSON, which boost will readily handle. A scalar is a simple case of an array [ 2 ]

The array is easy [ 1, 2]

Multidimensional [ [1,2,3,4], [5,6,7,8] ]

It's been a while since I've done this sort of thing, so I'm not sure how the code will break down for you. Definitely by expanding on this you could add row/column names. The code will be very nice, perhaps not quite as brainless as in python, but it should be simple.

Here's a link for the JSON format: http://json.org Here's a stackoverflow link for reading JSON with boost: Reading json file with boost

Community
  • 1
  • 1
1

A good option could be YAML.

It's a well known, human friendly data serialization standard for programming languages.

It fits quite well your needs: YAML syntax is designed to be easily mapped to data types common to most high-level languages: vector, associative array and scalar:

Size1: 123
---
Table1: [[1.0,2.0,3.0,4.0], [5.0,6.0,7.0,8.0]]

There are good libraries for C, C++ and many other languages. To get a feel for how it can be used see the C++ tutorial.

For interoperability you could also consider the way OpenCV uses YAML format:

%YAML:1.0
frameCount: 5
calibrationDate: "Fri Jun 17 14:09:29 2011\n"
cameraMatrix: !!opencv-matrix
   rows: 3
   cols: 3
   dt: d
   data: [ 1000., 0., 320., 0., 1000., 240., 0., 0., 1. ]

Since JSON and YAML have many similarities, you could also take a look at: What is the difference between YAML and JSON? When to prefer one over the other

Community
  • 1
  • 1
manlio
  • 18,345
  • 14
  • 76
  • 126
0

Thanks everyone for the suggestions.

The data is primarily numeric, with lots of dimensions and, given its size, it could be slow to parse with those text formats, I found that the quickest and cleanest way is to use a database, for now.

I still think it may be overkill but there are no clearly better alternatives now IMHO.

user881430
  • 195
  • 1
  • 1
  • 11