0

I have a file with products and prices, I need to write then to database, however I am trying to read a file and it reads only 1000 rows, and it says it is end of file although file has over 120000 rows. Look like reader is reading some rows from start of the document then some random from the middle and then some from the end of the file. Even if i do not write them to database and only writing them to database I got same result. Here is my code:

public async Task LoadProductsFromExcel()
    {
        var file = @"F:\Links\productsToImport.csv";
       var fileStream = new FileStream(file, FileMode.Open, FileAccess.Read);
        using (var streamReader = new StreamReader(fileStream))
        {   
            while (!streamReader.EndOfStream)
            {
                var line = streamReader.ReadLine();
                var data = line.Split(new[] { ';' });
                var product = new Product() { Name = data[1], Code = data[0]};       
                context.Products.Add(product);
                Console.WriteLine(data[0]+"   "+ data[1]);

            }

        }
        await unitOfWork.CompleteAsync();
    }

Is there a problem with buffer of stream or some any other problem? Or maybe I am reading a file wrong.

Dov95
  • 85
  • 9
  • 4
    Just use `File.ReadLines()`. – itsme86 Apr 16 '18 at 14:43
  • 2
    @itsme86 Which will load the entire thing into memory, not necessarily a good idea with a large file. – DavidG Apr 16 '18 at 14:45
  • https://stackoverflow.com/questions/29412757/what-is-the-default-buffer-size-for-streamwriter – mjw Apr 16 '18 at 14:46
  • @DavidG 120000 rows of text is large? This isn't 1987... – itsme86 Apr 16 '18 at 14:48
  • 4
    @DavidG `ReadLines` does not read the entire file into memory. – Magnus Apr 16 '18 at 14:49
  • Sanity check: where did you get the row count of 120,000 rows from? Is the code reading the file before another application has completed writing to it? Have you tried adding a counter in the method and writing the count to the console (or breaking in the debugger) to see exactly how many rows were loaded? – Paul Williams Apr 16 '18 at 14:54
  • 3
    Can you read the file into notepad++ or similar editor? I'm suspecting there are some unreadable characters. Just a guess. – JazzmanJim Apr 16 '18 at 14:55
  • @PaulWilliams I mean that file contains more than 120,000 rows and only 1000 were added to database, and yes I tried to debug and saw that reader skips lines – Dov95 Apr 16 '18 at 15:09
  • @bednarjm now I am trying to do it and see nothing suspicious – Dov95 Apr 16 '18 at 15:10
  • I agree with @bednarjm - there's nothing wrong with the code you've posted so the problem is very likely elsewhere. Are there other processes accessing this file? Have you tried the `FileStream` constructor overload passing `FileShare.None`? – Tom Troughton Apr 16 '18 at 15:12
  • @getsetcode No one is using that file instead of this method. Now I have noticed that in a middle of reading windows alert sound rumbles after few seconds it stops reading, and end of file is reached, because last rows are displayed, middle ones are gone – Dov95 Apr 16 '18 at 15:19
  • Why are you parsing CSV files yourself? Why aren't you using a dedicated library that is specifically designed to handle these files? – mason Apr 16 '18 at 15:43

1 Answers1

1

I can see a few possibilities.

First, the .csv file might not have Newline chars separating all the records, or it might have strange characters that screw up the ReadLine() loop. Open the file with Notepad++ (and make sure to show all symbols) or throw it into a Hex Editor.

Second, you've got a bug in your code that could generate an exception - it may be that this code is actually getting 1000-lines in and is exiting out due to an exception that's silently caught in a try/catch block higher up:

            var line = streamReader.ReadLine();
            var data = line.Split(new[] { ';' });
            var product = new Product() { Name = data[1], Code = data[0]};       
            context.Products.Add(product);
            Console.WriteLine(data[0]+"   "+ data[1]);

... if the line= gets reads a line that doesn't contain a semicolon, it's going to make data= a 1-element array. Then the next line, with Name = data[1] is going to generate an 'Past the end of an array' exception. So another possibility is that your source file has a line without a semi-colon and you've got a try/catch block that's catching it.

Kevin
  • 2,133
  • 1
  • 9
  • 21
  • It turns out that it was problem with file, found some symbols that could brake cycle. Now I am getting errors while writing to database Exception thrown: 'System.ObjectDisposedException' in Microsoft.EntityFrameworkCore.dll Exception thrown: 'System.ObjectDisposedException' in System.Private.CoreLib.dll – Dov95 Apr 16 '18 at 15:56
  • You'll need to google/search to see if anyone's had a similar problem - and assuming they haven't, you'll need to post a new question for the new problem. Glad you were able to get the first problem resolved, though :-) – Kevin Apr 16 '18 at 16:28
  • Oh, and even if it's not what caused this particular problem, I'd highly recommend you fix the bug. Even if you *want* the code to throw an exception if there are any badly-formatted lines in the CSV, you should at least throw an exception that explains the problem, not have it throw out a "Past the end of an array" exception which doesn't explain what's going on. – Kevin Apr 16 '18 at 16:30
  • Yeah i have fixed it, and fixed the other problem now everything working OK I managed to write data to database :) – Dov95 Apr 16 '18 at 16:35