0

I have a CSV file with translation pairs. It has the following scheme:

text language 1;text language 2
text language 1;text language 2
text language 1;text language 2

and so on. The problem is sometimes the text is very long or contains \n or even multiple quotation marks, like this:

"Very long long long long long long long long long long long long long long long long long long long text";"Very long long long long long long long long long long text2"
text;text2

My problem is that i cant figure out the right Regex pattern to split the word or sentence pairings correctly. Especially when its a long bracked containing \n or even \r\n . In these cases however, the sentence pairs are each encapsuled in quotation marks if thats any help. Similar to this

"Long text with lines\r\nmore lines\nand another line\nAnd yet another";"Long text with lines\r\nmorelines\nand another line\nAnd yet another"\r\n
word1;word2

so i assume, i need to split the word pairs if theres either a "\r\n or a \r\n" or a ; ? Sadly im not experienced with regular expressions.

I uploaded the csv here: http://s000.tinyupload.com/?file_id=11646241007071639575

Moonpaw
  • 55
  • 8
  • Will you have escaped quotation marks within the string, such as `"x\"y";"z"`? – ClickRick Dec 14 '15 at 16:27
  • In my experience, parsing a CSV file is complicated - you don't want to write your own parser. There are number of special cases that make the code or regex get out-of-hand fast. You may want to take a look at these answers for more help: https://stackoverflow.com/questions/3268622/regex-to-split-line-csv-file You may want to consider using an existing library, such as the one suggested in this answer: http://stackoverflow.com/a/5587666/892536 – tehDorf Dec 14 '15 at 20:29
  • I just figured, if excel can "parse" it correctly, i should be able to as well. In excel its all perfectly divided in 2 columns and for each string pair their own cells. Try it for yourself in excel if you may, i gave a link to the csv. Another option would be splitting the file using your own fail proof (if thats possible) code. At first i tried, little did i know, StreamReader.ReadLine() until i figured it only works for word pairs without multiple lines. – Moonpaw Dec 15 '15 at 07:48
  • Ok i finally solved my problem using a so called "TextFieldParser" (.NET frame 2.0 and higher, Microsoft.VisualBasic.FileIO Namespace) `using (TextFieldParser fParser = new TextFieldParser(file, enc)) { fParser.SetDelimiters(new string[] { ";" }); ... }` – Moonpaw Dec 15 '15 at 10:32

1 Answers1

0

Ok i finally solved my problem using a so called "TextFieldParser" (.NET frame 2.0 and higher, Microsoft.VisualBasic.FileIO Namespace)

using (TextFieldParser fParser = new TextFieldParser(file, enc)) { fParser.SetDelimiters(new string[] { ";" }); ... } – Moonpaw

Armali
  • 18,255
  • 14
  • 57
  • 171