To convert a CSV string into a list of elements, you could write a program that keeps track of state (in quotes or out of quotes) as it processes the string one character at a time, and emits the elements it finds. The rules for quoting in CSV are weird, so you'll want to make sure you have plenty of test data.
The state machine could go like this:
- scan until quote (go to 2) or comma (go to 3)
- if the next character is a quote, add only one of the two quotes to the field and return to 1. Otherwise, go to 4 (or report an error if the quote isn't the first character in the field).
- emit the field, go to 1
- scan until quote (go to 5)
- if the next character is a quote, add only one of the two quotes to the field and return to 4. Otherwise, emit the field, scan for a comma, and go to 1.
This should correctly scan stuff like:
- hello, world, 123, 456
- "hello world", 123, 456
- "He said ""Hello, world!""", "and I said hi"
- ""17.5179C,"" (correctly reports an error, since there should be a
separator between the first quoted string "" and the second field
17.5179C).
Another way would be to find some existing library that does it well. Surely, CSV is common enough that such a thing must exist?
edit:
You mention that speed is vital, so I wanted to point out that (so long as the quoted strings aren't allowed to include line returns...) each line may be processed independently in parallel.