0

I'm trying to get data from a csv-file from a Webservice. If i paste the url in my browser, the csv will be downloaded and look like the following example:

    "ID","ProductName","Company"
    "1","Apples","Alfreds futterkiste"
    "2","Oranges","Alfreds futterkiste"
    "3","Bananas","Alfreds futterkiste"
    "4","Salad","Alfreds futterkiste"
     ...next 96 rows

However I don't want to download the csv-file first and then extract data from it afterwards. The webservice uses pagination and returns 100 rows (determined by the &num-parameter with a max of 100). After the first request i can use the &next-parameter to fetch the next 100 rows based on ID. For instance the url

http://testWebservice123.com/Example.csv?auth=abc&number=100&next=100

will get me rows from ID 101 to 200. So if there are a lot of rows i would end up downloading a lot of csv-files and saving them to the harddrive. So instead of downloading the csv-files first and saving them hdd to I want to get data directly from the webservice to be able to write directly to a database without saving the csv-files.

After a bit of search I came up with the following solution

static void Main(string[] args)
    {


        string startUrl = "http://testWebservice123.com/Example.csv?auth=abc&number=100";
        string url = "";
        string deltaRequestParameter = "";
        string lastLine;
        int numberOfLines = 0;

        do
        {
            url = startUrl + deltaRequestParameter;
            WebClient myWebClient = new WebClient();

            using (Stream myStream = myWebClient.OpenRead(url))
            {

                using (StreamReader sr = new StreamReader(myStream))
                {
                    numberOfLines = 0;
                    while (!sr.EndOfStream)
                    {
                        var row = sr.ReadLine();
                        var values = row.Split(',');

                        //do whatever with the rows by now - i.e. write to console
                        Console.WriteLine(values[0] + " " + values[1]); 

                        lastLine = values[0].Replace("\"", ""); //last line in the loop - get the last ID.
                        numberOfLines++;
                        deltaRequestParameter = "&next=" + lastLine;
                    }

                }

            }
        } while (numberOfLines == 101); //since the header is returned each time the number of rows will be 101 until we get to the last request


    }

but im not sure if this is an "up to date" way of doing this, or if there is a better way (easier/simpler)? In other words i'm insecure about whether using WebClient and StreamReader is the right way to go?

In this thread: how to read a csv file from a url?

WebClient.DownloadString is mentioned as well as WebRequest. But if I want to write to a database without saving csv to hdd which is the best option?

Furhtermore - will the approach I have taken save data to a temporary disk storage behind the scenes or will all data be read into memmory and then disposed when the loop completes? I have read the following documentation but can't seem to find out what it does behind the scenes: StreamReader: https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader?view=netframework-4.7.2

Stream: https://learn.microsoft.com/en-us/dotnet/api/system.io.stream?view=netframework-4.7.2

Edit: I guess I could also be using the following "TextFieldParser"...but my questions is really still the same:

(using the Assembly Microsoft.VisualBasic)

    using (Stream myStream = myWebClient.OpenRead(url))
                {

                    using (TextFieldParser parser = new TextFieldParser(myStream))
                    {
                        numberOfLines = 0;

                        parser.TrimWhiteSpace = true; // if you want
                        parser.Delimiters = new[] { "," };
                        parser.HasFieldsEnclosedInQuotes = true;
                        while (!parser.EndOfData)
                        {
                            string[] line = parser.ReadFields();
                            Console.WriteLine(line[0].ToString() + " " + line[1].ToString());

                            numberOfLines++;

                            deltaRequestParameter = "&next=" + line[0].ToString();


                        }


                    }

                }
Tiawy
  • 175
  • 5
  • 11

1 Answers1

1

The HttpClient class on System.Web.Http is available as of .Net 4.5. You have to work with async code, but it's not a bad idea to get into it if you're dealing with the web.

As sample data, I'll use jsonplaceholder's "todo" list. It provides json data, not csv data, but it gives a simple enough structure that can serve our purpose in the example below.

This is the core function, which fetches from jsonplaceholder in a similar way to your "testWebService123" site, although I'm just getting the first 3 todo's, as opposed to testing for when I've hit the last page (you would probably keep your do-while) logic on that one.

async void DownloadPagesAsync() {

    for (var i = 1; i < 3; i++) {

        var pageToGet = $"https://jsonplaceholder.typicode.com/todos/{i}";

        using (var client = new HttpClient())
        using (HttpResponseMessage response = await client.GetAsync(pageToGet))
        using (HttpContent content = response.Content)
        using (var stream = (MemoryStream) await content.ReadAsStreamAsync()) 
        using (var sr = new StreamReader(stream))
        while (!sr.EndOfStream) {

            var row = 
                sr.ReadLine()
                .Replace(@"""", "")
                .Replace(",", "");

            if (row.IndexOf(":") == -1)
                continue;

            var values = row.Split(':');
            Console.WriteLine($"{values[0]}, {values[1]}");

        }

    }

}

This is how you would call the function, such as you would in a Main() method:

Task t = new Task(DownloadPagesAsync);
t.Start();

The new task, here is taking in an "action", or or in other words a function that returns void, as a parameter. Then you start the task. Be careful, it is asynchronous, so any code you have after t.Start() may very well run before your task completes.

As to your question as to whether the stream reads "in memory" or not, running GetType() on "stream" in the code resulted in a "MemoryStream" type, though it seems to only be recognized as a "Stream" object at compile time. A MemoryStream is definately in-memory. I'm not really sure if any of the other kinds of stream objects save temporary files behind the scenes, but I'm leaning towards not.

But looking into the inner workings of a class, though commendable, is not usually required for your anxiety about disposing. For any class, just see if it implements IDisposable. If it does, then put in in a "using" statement, as you have done in your code. When the program terminates, as expected or via error, the program will implement the proper disposures after control has passed out of the "using" block.

HttpClient is in fact the newer approach. From what I understand, it does not replace all of the functionality for WebClient, but is stronger in many respects. See this SO site for more details comparing the two classes.

Also, something to know about WebClient is that it can be simple, but limiting. If you run into issues, you will need to look into the HttpWebRequest class, which is a "lower level" class that gives you greater access to the nuts and bolts of things (such as working with cookies).

pwilcox
  • 5,542
  • 1
  • 19
  • 31
  • Hi, thank you for your elaborate answer with a lot of good informaiton. What would you think about using the TextFieldParser parser = new TextFieldParser(myStream) (i edited the end of my question) since this is good at handling csv-data i thought perhaps this would make good use? Would you also think that using this this was in memmory? – Tiawy Feb 01 '19 at 12:10
  • I can't answer with absolute confidence, but I can't imagine any decent implementation would require creation of temporary files under the hood. As to whether it's more performant to just splitting the string, I would say if that's ALL you need to do, maybe not. But testing is the way to answer that. But if you need more flexibility, TextFieldParser has more features to it. Also, look up the CsvHelper or similar libraries if it is a common task for you. – pwilcox Feb 01 '19 at 17:20