I have a CSV
file that contains line breaks inside quotation marks. I would like to get rid of those, (and replace them to \ ) so as to be able to CSV.parse line-by-line.
My original is a string containing
"a","b",c,"d
e",f,g,"h
i",j
k,"l","m
n","o"
and I would like to effectively be parsing a string containing :
"a","b",c,"d e",f,g,"h i",j
k,"l","m n","o"
How to do that in Ruby ?
An effective and down-to-earth solution thanks to user @sln
fichier = File.open ("baz.csv")
matchesBalancedLinesFromUser_sln = /^[^"]*(?:"[^"]*"[^"]*)*$/
mem = ""
fichier.each_line do |ligne|
mem += ligne.delete("\n") # as long as we don't have balance for
# the quotations marks, we cat the lines
if mem =~ matchesBalancedLinesFromUser_sln
ligneReplaced = mem + "\n"
doWhatYouWill(ligneReplaced)
mem = ""
end
end
fichier.rewind
Another way to do it without a regex, just counting the quotation marks
fichier = File.open ("baz.csv")
def doWhatYouWill (string)
puts string
end
mem = ""
fichier.each_line do |ligne|
mem += ligne.strip + " " # as long as we don't have balance for
# the quotations marks, we cat the lines
if mem.scan(/"/).count.even? # if mem has even quotation marks
ligneReplaced = mem + "\n"
doWhatYouWill(ligneReplaced)
mem = ""
end
end
fichier.rewind
Note This solution assumes that the CSV
file is valid in its balance of quotation marks. If this is not the case, see this comment by User @sln