3

I got a little Problem. I have a .csv with "NaN" values and doubles (0.6034 for example) and I am trying to read just the doubles of the CSV into an array[y][x].

Currently, i read the whole .csv, but I can not manage to remove all "NaN" values afterward. (It should parse through the CSV and just add the Numbers to an array[y][x] and leave all "NaN" out)

My current Code:

 var rows = File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray()).ToArray(); //reads WHOLE .CSV to array[][]


        int max_Rows = 0, j, rank;
        int max_Col = 0;
        foreach (Array anArray in rows)
        {
            rank = anArray.Rank;
            if (rank > 1)
            {
                 // show the lengths of each dimension
                for (j = 0; j < rank; j++)
                {

                }
            }
            else
            {

            }
            // show the total length of the entire array or all dimensions

            max_Col = anArray.Length; //displays columns
            max_Rows++;  //displays rows
        }

I tried the search but couldn't really find anything that helped me. I know this is probably really easy but I am new to C#.

The .CSV and the desired outcome:

NaN;NaN;NaN;NaN
NaN;1;5;NaN
NaN;2;6;NaN
NaN;3;7;NaN
NaN;4;8;NaN
NaN;NaN;NaN;NaN

This is a sample .csv i have. I should have been more clear, sorry! There is a NaN in every line. and i want it to display like this:

1;5
2;6
3;7
4;8

This is just a sample of the .csv the real csv has arround 60.000 Values... I need to get the input with [y][x] for example [0][0] should display "1" and [2][1] should displays "7" and so on.

Thanks again for all your help!

christian890
  • 147
  • 1
  • 12
  • BTW, I would _never ever_ dare to make my own CSV parser. Way to many edge cases ([e.g.](http://stackoverflow.com/a/6516246/107625)), IMO. – Uwe Keim Apr 25 '17 at 06:42
  • @Uwe Keim: I agree with you, that in *general case* CSV can well appear complex enough; however, when CSV source is *known* (say, my own routine which exports data from RDMBS for future analysis) or in case of single-use software (all I want is to remove incomplete rows and perform linear regression on the CSV provided) I'll be more indulgent. – Dmitry Bychenko Apr 25 '17 at 06:52

2 Answers2

6

You could do a filter of your delimited values in the array.

I've modified your code a bit.

 File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray().Where(y => y != "NaN").ToArray()).ToArray();
scartag
  • 17,548
  • 3
  • 48
  • 52
  • This looks good but i got some weird "error" when i use this and try to check the results by showing the 1. Value of the array -> MessageBox.Show(Convert.ToString(rows[0][0])) it says "false" and for MessageBox.Show(Convert.ToString(rows[10][10])) "true" instead of the numbers, what am i doing wrong here? Your solution should work i think its really simple =) Thanks – christian890 Apr 25 '17 at 06:36
  • You don't want `ReadAllLines` (premature materialization), `ReadLines` will be a better choice – Dmitry Bychenko Apr 25 '17 at 06:39
  • 1
    @christian890 i've modified the code a bit. there was an error cause i wasn't testing with any data and used `select` instead of a `where` – scartag Apr 25 '17 at 06:42
  • `.ToArray()` in the `Split(';').ToArray()` is *redundant* since `Split(';')`returns an *array* – Dmitry Bychenko Apr 25 '17 at 06:43
  • Thanks for all the answers. But i can not get it work just yet. I use the Code with "where(y =>..." and then try to check the outputs with "MessageBox.Show(Convert.ToString(rows[0][0])); // displays nothing (empty) MessageBox.Show(Convert.ToString(rows[0][1])); // ERROR out of Array MessageBox.Show(Convert.ToString(rows[1][0]));" This is weird, without the "where." there were no dimensions error – christian890 Apr 25 '17 at 06:50
  • @christian890 might help to share a few rows of your csv – scartag Apr 25 '17 at 06:51
  • NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 1 1 1 NaN NaN NaN NaN NaN NaN 1 1 1 1 NaN NaN NaN NaN NaN NaN 1 1 1 1 NaN NaN NaN NaN NaN NaN 1 1 1 1 NaN NaN NaN NaN NaN NaN 1 1 1 0.9782 NaN NaN NaN NaN NaN NaN 0.6821 0.6976 0.6862 0.7017 NaN NaN NaN NaN NaN NaN 0.9407 0.951 0.9588 0.9758 NaN NaN NaN NaN NaN NaN 0.7132 0.7216 0.7364 0.7289 NaN NaN NaN NaN NaN NaN 0.6254 0.6216 0.629 0.6493 NaN NaN NaN NaN NaN NaN 0.6427 0.6349 0.6451 0.6877 NaN NaN NaN NaN NaN NaN 0.5033 0.4994 0.4934 0.5079 – christian890 Apr 25 '17 at 06:52
  • oh sorry, thats because i opened it with excel.. aN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN; NaN;NaN;NaN;NaN;NaN;NaN;1;1;1;1;1;1;1;1;1;1;0.8627;0.7224;0.6194;0.5598;0.5146;0.4292;0.3843;0.3778;0.3532;0.2883;0.335;0.4728;0.6538;0.8563;0.9614;0.9056;0.827;0.7641;0.6197;0.4731;0.3603;0.279;0.2894;0.2924;0.3055;0.2956;0.3114;0.3061;0.2828;0.2689;0.2367;0.1968;0.2111;0.3458;0.3421;0.3291;0.3136;0.2969;0.2312;0.2121;0.1908;0.2031;0.1994;0.2152;0.2185;NaN;NaN;NaN – christian890 Apr 25 '17 at 06:55
  • 1
    @christian890 i tried your values and it worked for me... although i got a lot of special characters within the values .. maybe cause i copied it from the page or something. – scartag Apr 25 '17 at 06:59
  • Thas weird... I get: "An unhandled exception of type 'System.IndexOutOfRangeException' occurred in WindowsFormsApplication1.exe" for [0][1] or [1][0] and [0][0] displays nothing =/ This code or? " var rows = File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray().Where(f => f != "NaN").ToArray()).ToArray();" – christian890 Apr 25 '17 at 07:05
  • 1
    @christian890 i'm wondering if you could create a new csv by hand just to test the code and then check that the actual csv has the right newline characters required .. i'm sure there is something wrong with the file and its content. – scartag Apr 25 '17 at 07:07
  • Thank you for your time i really appreciate it. I tried that just now and the error gets even weirder. I get the "index is out of array" already when i want to display [0][0]... The .Csv i used this time: (hand made) NaN;NaN;NaN;NaN NaN;1;5;NaN NaN;2;6;NaN NaN;3;7;NaN NaN;4;8;NaN NaN;NaN;NaN;NaN – christian890 Apr 25 '17 at 07:14
  • I just checked it with "var rows = File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray()).ToArray();" it works but of course all "NaN" are still there but with this i have no problems with the csv and with "var rows = File.ReadAllLines(filepath).Select(l => l.Split(';').ToArray().Where(f => f != "NaN").ToArray()).ToArray();" it says index out of array.... weird – christian890 Apr 25 '17 at 07:19
1

If you want to remove all the lines that contain NAN (typical task for CSV - clearing up all incomplete lines), e.g.

  123.0; 456; 789
    2.1; NAN;  35     <- this line should be removed (has NaN value)
     -5;   3;  18

You can implement it like this

  double[][] data = File
    .ReadLines(filepath)
    .Select(line => line.Split(new char[] {';', '\t'},
                               StringSplitOptions.RemoveEmptyEntries))
    .Where(items => items  // Filter first...
       .All(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase)))
    .Select(items => items
       .Select(item => double.Parse(item, CultureInfo.InvariantCulture))
       .ToArray()) // ... materialize at the very end
    .ToArray();

Use string.Join to display rows:

 string report = string.Join(Environment.NewLine, data
   .Select(line => string.Join(";", line)));

 Console.Write(report);

Edit: The actual problem is to take 2nd and 3rd complete columns only from the CSV:

NaN;NaN;NaN;NaN
NaN;1;5;NaN
NaN;2;6;NaN
NaN;3;7;NaN
NaN;4;8;NaN
NaN;NaN;NaN;NaN

desired outcome is

[[1, 5], [2, 6], [3, 7], [4, 8]]

implmentation:

double[][] data = File
  .ReadLines(filepath)
  .Select(line => line
     .Split(new char[] {';'},
            StringSplitOptions.RemoveEmptyEntries)
     .Skip(1) 
     .Take(2)
     .Where(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase))
     .ToArray())
  .Where(items => items.Length == 2)
  .Select(items => items
    .Select(item => double.Parse(item, CultureInfo.InvariantCulture))
    .ToArray())
  .ToArray();

Tests

// 1
Console.Write(data[0][0]);
// 5
Console.Write(data[0][1]);
// 2
Console.Write(data[1][0]);

All values in one go:

string report = string.Join(Environment.NewLine, data
   .Select(line => string.Join(";", line)));

Console.Write(report);

Outcome:

1;5
2;6
3;7
4;8 

Edit 2: if you want to extract non NaN values only (please, notice that the initial CSV structure will be ruined):

1;2;3              1;2;3
NAN;4;5            4;5   <- please, notice that the structure is lost
6;NAN;7        ->  6;7
8;9;NAN;           8;9
NAN;10;NAN         10
NAN;NAN;11         11 

then

double[][] data = File
  .ReadLines(filepath)
  .Select(line => line
     .Split(new char[] {';'},
            StringSplitOptions.RemoveEmptyEntries)
     .Where(item => !string.Equals("NAN", item, StringComparison.OrdinalIgnoreCase)))
  .Where(items => items.Any()) 
  .Select(items => items
    .Select(item => double.Parse(item, CultureInfo.InvariantCulture))
    .ToArray())
  .ToArray();
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • Thanks for your time, but i am getting an Error on this one i can not solve just yet:" Error 1 An object reference is required for the non-static field, method, or property 'string.Equals(string, System.StringComparison)' C:\Users\Documents\Visual Studio 2013\Projects\WindowsFormsApplication1\WindowsFormsApplication1\Test.cs 734 22 WindowsFormsApplication1 Btw. i do not want to remove the whole line, just the NaN itself – christian890 Apr 25 '17 at 07:00
  • 1
    @christian890: I'm sorry for typo: `string.Equals("NAN", item...` we should compare `NAN` and `item` – Dmitry Bychenko Apr 25 '17 at 07:01
  • Ahh thanks, it works now without mistakes but i have the same Problem as above. when i use " MessageBox.Show(Convert.ToString(rows[0][0]));" it displays: An unhandled exception of type 'System.IndexOutOfRangeException' occurred in WindowsFormsApplication1.exe it looks like something is wrong with my .csv but i cant figure out what it is – christian890 Apr 25 '17 at 07:17
  • @christian890: could you, please, provide the CSV? – Dmitry Bychenko Apr 25 '17 at 07:18
  • @christian890: as far as I can see from the CSV you've provided to **scartag**: `NaN;NaN;NaN;NaN NaN;1;5;NaN NaN;2;6;NaN NaN;3;7;NaN NaN;4;8;NaN NaN;NaN;NaN;NaN` *all* the lines have at least one `NAN` that's why the result array `data` is *empty* – Dmitry Bychenko Apr 25 '17 at 07:21
  • @christian890: could you, please, *edit* your question: provide the test CSV and the desired outcome. – Dmitry Bychenko Apr 25 '17 at 07:22
  • Done. I think as Dmitry pointed out could be the mistake, I am sorry to be unspecific, thanks for your help! – christian890 Apr 25 '17 at 07:29
  • This is just a sample of the .csv the real csv has arround 60.000 Values... I need to get the input with [y][x] for example [0][0] should display "1" and [2][1] should displays "7" – christian890 Apr 25 '17 at 07:40
  • 1
    @christian890: I see; if you want to extract 2nd and 3d complete lines then see my edit; since `data` is a jagged array you cane obtain the items required as you want: `data[x][y]`, e.g. `data[0][0]` should return `1` – Dmitry Bychenko Apr 25 '17 at 07:44
  • Thanks! It works perfect for the Dummy. But if i use the real .csv with more "NaN" lines and columns before any values i have the "Index out of array" error again =/ Problem is i cant say in which line i have "NaN" values, i need to read the actual Value of the csv and skip it if its "NaN" if its not, write it to array, and this for every value in the .csv – christian890 Apr 25 '17 at 07:54
  • This should work perfect, I dont mind loosing the structure (if there is 1 Nan value the whole line OR column is NaN -> all in all the structure will be the same) just 1 problem with the code: " An unhandled exception of type 'System.FormatException' occurred in mscorlib.dll Additional information: The input character string has wrong format" i get on ".Select(item => double.Parse(item, CultureInfo.InvariantCulture))" this line, you may know why? Thank you so much!!! i think you saved my day ! =) – christian890 Apr 25 '17 at 08:13
  • 1
    @christian890: The cause of the error is that a value within CSV is neither `NaN` not valid `double` value, e.g. `"bla-bla-bla"`. Could you, please, provide the actual CSV file? – Dmitry Bychenko Apr 25 '17 at 08:16
  • this is the whole csv: http://www.sharecsv.com/s/8b4139e47847443ef153ef6583056771/measurements.csv or http://textuploader.com/drhq8 – christian890 Apr 25 '17 at 08:21
  • I managed to delimit the error, it appears that if i delete the last column of the csv it works... so i guess it is something about the last character? Edit: the Problem is an empty field! if i have NaN;;NaN i get this error asap! and in the .CSV i get from the system the last field is empty. How can i fix this? THANKS!!!! – christian890 Apr 25 '17 at 08:49
  • 1
    @christian890: you have to remove not only `NaN`, but empty values as well - `.Split(new char[] {';'}, StringSplitOptions.RemoveEmptyEntries)` – Dmitry Bychenko Apr 25 '17 at 08:55
  • 1
    @christian890: You are welcome! Next time, please, start with actual data :) – Dmitry Bychenko Apr 25 '17 at 09:03