1

This is my CSV file

this file is provided from externally resource and saved in csv format with pipeline separator and on this I have to work

||||||||||||||||||||||||||||||||||||||||||||||||||
|Table1|||||||||||||||||||||||||||||||||||||||||||||||||            
||||||||||||||||||||||||||||||||||||||||||||||||||          
N|IDI  |TEST|START DATE HOUR    |CAUSE|KIND|NUMB|NAMES|         
1|10704|    |21/07/2020 15:05:54|L    |MT  |2786|NAV  |         
2|10660|    |21/07/2020 09:27:31|L    |MT  |4088|PIS  |     
values of names 
values of names .|0|0|1|1|0|0||||
||||||||||||||||||||||||||||||||||||||||||||||||||          
|Table2|||||||||||||||||||||||||||||||||||||||||||||||||            
||||||||||||||||||||||||||||||||||||||||||||||||||          
N|IDI  |TEST|START DATE HOUR    |END DATE HOUR      |LENGHT  |RETURNS         |CAUSE|KIND|NUMB|NAMES|           
1|10710|    |21/07/2020 19:34:00|21/07/2020 20:19:09|00:45:09| -              |L    |MT  |7806|ACC  |
2|10708|    |21/07/2020 18:28:12|21/07/2020 18:28:13|00:00:01| -              |T    |MT  |2600|LIT  |       
3|10700|    |21/07/2020 14:16:37|21/07/2020 15:19:13|01:02:36|21/07/2020 17:00|L    |MT  |4435|UHI  |       
4|10698|    |21/07/2020 14:06:45|21/07/2020 14:07:22|00:00:37|-               |B    |MT  |5789|TYK  |
5|10674|    |21/07/2020 10:21:04|21/07/2020 10:44:41|00:23:37|21/07/2020 12:30|T    |MT  |6699|FGR  |
||||||||||||||||||||||||||||||||||||||||||||||||||

I need delete or skip these rows on csv file because the line not starting with number or N value or pipeline |

values of names 
values of names .|0|0|1|1|0|0||||

this is my code behind in error

Index was out of range. Must be non-negative and less than the size of the collection parameter name:index

if the line does not start with number or N value or pipeline |

int posNewColumn = 4;

string input = @"C:\Temp\SO\import.csv";
string output = @"C:\Temp\SO\out.csv";

string[] CSVDump = File.ReadAllLines(input);
List<List<string>> CSV = CSVDump.Select(x => x.Split('|').ToList()).ToList();
foreach (List<string> line in CSV)
{
    if (line[1] == "Table2")
    {
        break;
    }
    line.Insert(posNewColumn, line[0] == "N" ? "LENGHT" : string.Empty);
    line.Insert(posNewColumn, line[0] == "N" ? "END DATE HOUR" : string.Empty);
}

File.WriteAllLines(output, CSV.Select(x => string.Join("|", x)));

if there is only one element in line ( as in the line values of names) i need skip or delete lines

Can you help me please?

  • Well i will adress the obvious : This is not a [CSV](https://en.wikipedia.org/wiki/Comma-separated_values). I know that the most cited RFC about CSV start with "_This memo provides information for the Internet community. It does not specify an Internet standard of any kind_". But this goes beyond the TSV/CSV RFC fight. – Drag and Drop Aug 25 '20 at 11:51
  • @DragandDrop this file is provided from externally resource and saved in csv format with pipeline separator and on this I have to work –  Aug 25 '20 at 11:53
  • You won't be able to use the standrad way of handleing multiple object in the same csv file on this input https://stackoverflow.com/questions/34057465/reading-multiple-classes-from-single-csv-file-using-csvhelper. You should adress it like a Text file: split on "table_Digit", filter all the row where the first char is not a N or a number – Drag and Drop Aug 25 '20 at 11:54
  • As usually that kind of error happens because you are trying to read an element from an array that doesn't exist at the index specified. In this case is the _if (line[1] == "Table2")_ What if there is only one element in line? ( as in the line _values of names_) – Steve Aug 25 '20 at 11:55
  • @Steve if there is only one element in line ( as in the line values of names) i need skip or delete lines –  Aug 25 '20 at 11:56
  • Then you need to check the length of the line list before doing anything with it – Steve Aug 25 '20 at 11:57
  • How big is this file? If the file is not really big this attempt to overwrite the lines directly in memory overcomplicates things – Steve Aug 25 '20 at 11:59
  • @Steve no Steve, the file it's never greater than 45,5 KB –  Aug 25 '20 at 12:01
  • Does this answer your question? [What is an IndexOutOfRangeException / ArgumentOutOfRangeException and how do I fix it?](https://stackoverflow.com/questions/20940979/what-is-an-indexoutofrangeexception-argumentoutofrangeexception-and-how-do-i-f) – Liam Sep 28 '20 at 09:47

2 Answers2

1

So you want to skip all lines that start with a pipe?

List<List<string>> CSV = CSVDump
  .Where(x => !x.StartsWith('|'))
  .Select(x => x.Split('|').ToList()).ToArray();

So you want to keep anything that starts with a number, an N or a pipe?

List<List<string>> CSV = CSVDump
  .Where(x => x.Length > 0 && "0123456789N|".Contains(x[0]))
  .Select(x => x.Split('|').ToList()).ToArray();

In response to Steve's concerns about performance etc, perhaps the best route to go is:

int posNewColumn = 3;

string input = @"C:\Temp\SO\import.csv";
string output = @"C:\Temp\SO\out.csv";

using (var dest = File.CreateText(output))
{  
    bool adjust = true;

    foreach (string s in File.ReadLines(input))
    {
        if(line.Length == 0 || !"0123456789N|".Contains(line[0]) //skip zero len or line not begin with number/pipe/N
          continue;

        string line = s; //copy enum variable so we can adjust it

        if(adjust)
        {
          string[] bits = line.Split('|');
          
          if(line.StartsWith("N"))
            bits[posNewColumn] += "|END DATE HOUR|LENGHT";
          else
            bits[posNewColumn] += "||";
          
          line = string.Join("|", bits);
        } 

        if(line.StartsWith("|Table2")
          adjust = false;

        dest.WriteLine(line);
    } 
}

This requires minimal memory and processing; we don't split every line needlessly, thousands of Lists are not created, we don't try to hold the whole file in memory; we just read lines in and maybe write them out, and maybe adjust them if we didn't encounter Table2

Note; I have written it but not debugged/tested it - it might have a typo or a minor logic error; treat it as pseudocode

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • No, is the opposite... I need delete or skip these rows on csv file because the line not starting with `number` or `N` value or `pipeline |` –  Aug 25 '20 at 12:07
  • What will you do with a line like `|||||||||||||||||||||||||||||||||||||` ? What useful data does it have? – Caius Jard Aug 25 '20 at 12:10
  • they are deleted when imported into the mysql database –  Aug 25 '20 at 12:12
  • I made an edit, because I now understand that you want to keep anything starting with a number, a pipe or a letter N, right? – Caius Jard Aug 25 '20 at 12:14
  • Sorry to intrude but while this resolves the check for the lines then you need to add another loop to insert the text. While we are talking about a 45K file this added loop should add a performance penalty in my opinion. – Steve Aug 25 '20 at 12:51
  • @Steve it was just intended to be an insert into the original code rather than a complete replacement of it, but I've added an edited version to address performance concerns – Caius Jard Aug 25 '20 at 14:33
0

In my opinion you are overcomplicating the problem trying to update the same line while you loop over the lines collection. A simple approach (given the small file size) is to use another list that contains only the 'approved' lines.

For example:

int posNewColumn = 4; // ???
string input = @"C:\Temp\SO\import.csv";
string output = @"C:\Temp\SO\out.csv";

List<string> outputLines = new List<string>();
foreach (string line in File.ReadLines(input))
{
    var parts = line.Split('|').ToList();
    if (parts.Count > 1)
    {
        if (parts[1] == "Table2")
        {
            break;
        }
        
        // Add here all the conditions that allow a line to be 
        // written in the output file
        char c = parts[0][0];
        if(c == '|' || c == 'N' || char.IsDigit(c))
        {
           parts.Insert(posNewColumn, parts[0] == "N" ? "LENGHT" : string.Empty);
           parts.Insert(posNewColumn, parts[0] == "N" ? "END DATE HOUR" : string.Empty);
           outputLines.Add(string.Join("|", parts);
        }
    }
}
File.WriteAllLines(output, outputLines);

This solution includes also the part where you add the new text into the lines approved for the output. While using Linq resolves with a single line the inclusion check, then you need another loop (in addition to the implicit one required by Linq) to insert the text

Steve
  • 213,761
  • 22
  • 232
  • 286