0

Is any way to parse CSV using c++ boost tokenizer if embedded double-quote characters are represented by pair of double-quote characters?

Wiki's article Comma-separated values says Each of the embedded double-quote characters must be represented by a pair of double-quote characters and provides the following example of csv file:

Year,Make,Model,Description,Price

1997,Ford,E350,"ac, abs, moon",3000.00

1999,Chevy,"Venture ""Extended Edition""","",4900.00

1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00

1996,Jeep,Grand Cherokee,"MUST SELL!

air, moon roof, loaded",4799.00

Default boost tokenizer typedef tokenizer< escaped_list_separator<char> > tokenizer; removes embedded quotas but works fine if use \" instead of "".

sehe
  • 374,641
  • 47
  • 450
  • 633
Daniil
  • 143
  • 3
  • 9
  • You could replace two quotes with a backslash and a quote before tokenizing the string? – Anon Mail Oct 06 '17 at 17:57
  • 1
    This is how it's defined in the [RFC](https://tools.ietf.org/html/rfc4180) so you'll have to use a parser that can deal with that. Look at [other answers](https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c) for inspiration here. Writing your own tool should be a last-ditch effort, as these file formats are slippery at best. – tadman Oct 06 '17 at 18:02
  • I don't think that a simple substitution of "" for \" would work, because "" would match an empty token. Some sort of regular expression might fit. I agree in principle writing your own tokenizer ought to be a last resort, but my experience is that I end up falling back on that last resort more often than not. I often find it necessary to tweak the tokenizing rules to deal with odd cases that crop up. – Kevin Boone Oct 06 '17 at 18:07
  • For my task it is not an issue that tokenizer removes double quotes. It just academic question to make sure it is not bug it is tokenizer's feature i.e. tokenizer do not implement RFC. – Daniil Oct 06 '17 at 18:19

0 Answers0