0

I have this csv file which i import and then do the parsing, but problem is in the file there are some cells that are parsed into multiple ones, which then creates a problem for me. Here's an example of text:

1931 Minerva AL Convertible Sedan by Rollston,,1931-Minerva-AL-Convertible-Sedan-.jpg (https://v5.airtableusercontent.com/v1/13/13/1671638400000/s8UUuz_XkjB-zTY-UuP8oQ/9jJRwxNAJsWKAezn2JJYe851pOwokxYQjVTcFoJXngT1LGqMQFQLLu-Byxnc0qJg2-Df-5LGwGZqcZC2blEYvQ/eebBewgk-zkUopRwKkb3Cch9dGclwgWpv9vDmGC_0Wc),Minerva,AL Convertible Sedan by Rollston,,1931,"$800,000 ",,,,,,,,Unnamed record,,,,,,Giorgia Bertoni,14.7.2021 1:29pm,
1932 Alfa Romeo 8C 2300 Short Chassis Zagato,,zagato-bodied-at-pebble-beach-concours-12.jpg (https://v5.airtableusercontent.com/v1/13/13/1671638400000/36XwITJaDsx7_RJr6qLcyw/ldiMut7-WxlTUsqin_cUK0FouKAX7m1-gw_dvyWsCmsV3wy3WfrtkHd58d4aK_4RUjsY9jWm1EnEGUspbps0YGCioSJ05N6im_gmu4khDYk/-uIq6Bp3TUhYdVIENI94nzWvSBJvH1G6vIDgjhCI5R0),Alfa Romeo,8C 2300 Short Chassis Zagato,,1932,"$15,000,000 ",,Yes,,,,,"The 8C is one of the oldest Zagatos in existence. It was the successor to Alfa Romeo’s 6C 1750 and debuted in 1931 with a 2.3-liter straight-eight engine. It was a race car to be used exclusively by Alfa Romeo but was later sold to private owners as a rolling chassis. A number of coachbuilders penned bodies for the chassis, one of which was Zagato.

Some of the Zagato-bodied examples were favored by Enzo Ferrari, who used them in the early days of Scuderia Ferrari. He selected Zagato as a technical partner because of its specialization in creating light and aerodynamic racing bodies, inspired by aeronautics. And the 8C 2300 Zagato, in different versions dominated the most important races of the period (among them the Mille Miglia, 24 Hours of Le Mans, Targa Florio and the 24 Hours of Spa).","Best of Show, Best in Class, Best in Class, Best of Show",,,,,Martin Halusa,Giorgia Bertoni,14.7.2021 1:29pm,
1932 Lincoln KB Murphy Roadster,,fbdd6e82fc9942b4a51996c7ccb5d5b0[1].jpg (https://v5.airtableusercontent.com/v1/13/13/1671638400000/hRpw9Z3hpSEYtUEKLP_TVw/19q9HpqOKN0Kx198Zv5W5i5B2W5OkIlrnE3JxAlQOmvtVA54Z_JI4U72GxWSnAHpeUOkPjEhJcCmRGrKsIVZJKE3njW2C4t53BaeBTCJvIYG3764l-Tj78LRGLaXonip/8jVGjQYR3opI_uUqXqDX9Jn1xtVR_ffdamzdB5xwHyg),Lincoln,KB Murphy Roadster,,1932,"$250,000 ",,,,,,,,,,,,,,,,

The moment it parses to new line is "Some of the Zagato..."

How can I make regex that way it parses the document and then It edits it the I need it, so I get better parsed CSV file.

The trick is that every row ends with a comma, which is a delimiter.

This is the main thing I was trying, and I was expecting it to edit the text the way i wanted it.

 var fileContent = File.ReadAllText(filePath);
            var pattern = @"(.*)(\r\n(?![\s\S]*?,))";
            var result = Regex.Replace(fileContent, pattern);
  • https://softwareengineering.stackexchange.com/questions/166454/can-the-csv-format-be-defined-by-a-regex – Selvin Feb 10 '23 at 16:07
  • 3
    Do not try to write your own CSV parser. The format is more complicated than you realize. There are already [many working, debugged CSV parsers available](https://stackoverflow.com/questions/1941392/) for you to drop into your app. Use them – Dour High Arch Feb 10 '23 at 16:13
  • @Fildor - Actually I don't think the input is broken. The newline characters are inside quoted strings so I believe a proper CSV reader will parse them correctly as part of single cell. – dbc Feb 10 '23 at 16:15
  • @dbc I think you may be right. I guess I didn't scroll long enough the first time. – Fildor Feb 10 '23 at 16:16
  • Your CSV contains CR or LF characters included in cell values. This is allowed per [RFC 4180](https://www.rfc-editor.org/rfc/rfc4180#section-2) as long as the cells are enclosed in the **escape character "**: *`escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE`*, which they are in your case. Thus, rather than reading line-by-line with a regex you must use a proper CSV parser, either the builtin `Microsoft.VisualBasic.FileIO.TextFieldParser` or the popular 3rd party package [tag:csvhelper]. Demo using TextFieldParser here: https://dotnetfiddle.net/I41zZz. – dbc Feb 10 '23 at 17:33

0 Answers0