I have a csv file and I want to read it line by line. The problem is that some cell values are in quotes containing line breaks.
Here is an example CSV:
Product,Description,Price
Product A,This is Product A,20
Product B,"This is much better
than Product A",200
The standard getLines() function does not handle that.
Source.fromFile(inputFile).getLines() // will split at every line break, regardless if quoted or not
The getLines gets something like:
Array("Product", "Description", "Price")
Array("Product A", "this is Product A", "20")
Array("Product A", "\"This is much better")
Array("than Product A\"", "20")
But it should be like:
Array("Product", "Description", "Price")
Array("Product A", "this is Product A", "20")
Array("Product A", "\"This is much better\nthan Product A\"", "20")
I tried it to read the file fully and split is with a RegEx similar to this post https://stackoverflow.com/a/31193505
file.mkString.split("""\n(?=(?:[^"]*"[^"]*")*[^"]*$)""")
The regex works fine, but I'm getting a stack overflow exception because the file is too big to handle it fully out of memory. I tried it with a smaller version of the file and it worked.
As stated in the post, a foldLeft() could help for bigger files. But I'm not sure how it should work, when iterating over every Char of the string, to pass all at once...
- the Char of your current iteration
- the Line what you are building
- and the List of already created lines
Maybe it works to write an own tail recursive version of the getLines, but I'm not sure if there isn't a more practical solution instead of handling it char by char.
Do you see any other functional-style solution to this problem?
Tanks and regards, Felix