3

Scenario: I'm working on a rails app that will take data entry in the form of uploaded text-based files. I need to parse these files before importing the data. I can choose the file type uploaded to the app; the software (Microsoft Access) used by those uploading has several export options regarding file type.

While it may be insignificant, I was wondering if there is a specific file type that is most efficiently parsed. This question can be viewed as language-independent, I believe.

(While XML is commonly parsed, it is not a feasible file type for sake of this project.)

anxiety
  • 1,689
  • 16
  • 25
  • 3
    This question is way too open ended. What kind of data are you importing? would CSV work? Would YAML work? – Mitch Dempsey May 06 '10 at 21:46
  • One that is sufficient to the task and you have an existing tool to parse? One the submitter has a tool to emit? One simply enough for humans to reliably write without a tool? Details are important here... – dmckee --- ex-moderator kitten May 06 '10 at 21:49
  • 1
    And it's really hard to give a good answer unless we know why XML is not appropriate. The complexity of the data the file needs to describe will also mean a lot. Perhaps consider something like CSV, or the Ini file format? – Svend May 06 '10 at 21:50
  • The available file formats are any export type available to Microsoft Access. The uploaded files to my application are exported from Access apps. The data is roughly 20 fields; strings and integer values. – anxiety May 06 '10 at 21:52
  • You probably should edit the question (and maybe the title and tags) to say that Microsoft Access must produce the file. It might also help if you listed briefly what Access can export. – David Thornley May 06 '10 at 22:00

4 Answers4

2

You might want to take a look at JSON. It's a lightweight format, and in contrast to XML it's really easy and clean to parse without requiring a huge library on the backend.

It can represent types like strings, numbers, assosiative arrays (objects), and lists of such

LukeN
  • 5,590
  • 1
  • 25
  • 33
  • If I'm not mistaken, JSON isn't a format that Microsoft Access can export. I apologize for not mentioning that the files to be uploaded to my app are Access exports. – anxiety May 06 '10 at 21:57
2

If it is something exported by Access, the easiest would be CSV; particularly since Ruby contains a CSV parser in the standard library. You will have to do some work determining the dialect of CSV (what it uses for delimiter, how it handles quotes); I don't know how robust the ruby parser is with those issues, but you also should have some control from Microsoft Access.

Kathy Van Stone
  • 25,531
  • 3
  • 32
  • 40
0

I would suggest n-SV (where n is some character) for data that does not include n. That will make lexing the files a matter of a split.

If you have more flexible data, I would suggest JSON.

Paul Nathan
  • 39,638
  • 28
  • 112
  • 212
  • CSV (or n-SV) is very hard to parse yourself, since you have to account for including the delimiters themselves – JoelFan May 06 '10 at 22:10
  • I assume CSV would then be the best format to use given the conditions: 1. The files uploaded to my app are ms-access exports 2. I will be parsing in ruby – anxiety May 06 '10 at 22:18
  • @anxiety: you should review the condition that JoelFan brought up. If you have CSV and it has a string in it that has `..., "blah, foo",...`, you will have all sorts of *fun* parsing it. If you are accepting European numbers, commas will be found from time to time. Plus there is the 1,000,000 human-readable number format. My point is, "get a CSV engine if the data is complicated". – Paul Nathan May 06 '10 at 22:56
0

If you've HAVE to roll your own parser, I would suggest CSV or some form of a delimiter separated format.

If you are able to use other libraries, there are plenty of options. JSON looks quite fascinating.

Robb
  • 2,666
  • 4
  • 20
  • 24
  • CSV (or n-SV) is very hard to parse yourself, since you have to account for including the delimiters themselves – JoelFan May 06 '10 at 21:55
  • Hard, but doable. Here are Java based examples: [parseCsv](http://stackoverflow.com/questions/2241915/regarding-java-string-manipulation/2241950#2241950) and [writeCsv](http://stackoverflow.com/questions/477886/jsp-generating-excel-spreadsheet-xls-to-download/2154226#2154226). – BalusC May 06 '10 at 22:14
  • Really? I would think something pretty simple could be written up that probably wouldn't be to flexible but at least would solve his problems. – Robb May 06 '10 at 22:59