7

My 1st question, so please be kind... :)

I'm using the C# HttpClient to invoke Jobs API Endpoint.

Here's the endpoint: Jobs API Endpoint (doesn't require key, you can click it)

This gives me JSON like so.

{
  "count": 1117,
  "firstDocument": 1,
  "lastDocument": 50,
  "nextUrl": "\/api\/rest\/jobsearch\/v1\/simple.json?areacode=&country=&state=&skill=ruby&city=&text=&ip=&diceid=&page=2",
  "resultItemList": [
    {
      "detailUrl": "http:\/\/www.dice.com\/job\/result\/90887031\/918715?src=19",
      "jobTitle": "Sr Security Engineer",
      "company": "Accelon Inc",
      "location": "San Francisco, CA",
      "date": "2017-03-30"
    },
    {
      "detailUrl": "http:\/\/www.dice.com\/job\/result\/cybercod\/BB7-13647094?src=19",
      "jobTitle": "Platform Engineer - Ruby on Rails, AWS",
      "company": "CyberCoders",
      "location": "New York, NY",
      "date": "2017-04-16"
    }
 ]
}

I've pasted a complete JSON snippet so you can use it in your answer. The full results are really long for here.

Here's are the C# classes.

using Newtonsoft.Json;
using System.Collections.Generic;

namespace MyNameSpace
{
    public class DiceApiJobWrapper
    {
        public int count { get; set; }
        public int firstDocument { get; set; }
        public int lastDocument { get; set; }
        public string nextUrl { get; set; }

        [JsonProperty("resultItemList")]
        public List<DiceApiJob> DiceApiJobs { get; set; }
    }

    public class DiceApiJob
    {
        public string detailUrl { get; set; }
        public string jobTitle { get; set; }
        public string company { get; set; }
        public string location { get; set; }
        public string date { get; set; }
    }
}

When I invoke the URL using HttpClient and deserialize using JSON.NET, I do get the data back properly.

Here's the code I am calling from my Console App's Main method (hence the static list, I think this could be better refactored??)

   private static List<DiceApiJob> GetDiceJobs()
    {
        HttpClient httpClient = new HttpClient();
        var jobs = new List<DiceApiJob>();

        var task = httpClient.GetAsync("http://service.dice.com/api/rest/jobsearch/v1/simple.json?skill=ruby")
          .ContinueWith((taskwithresponse) =>
          {
              var response = taskwithresponse.Result;
              var jsonString = response.Content.ReadAsStringAsync();
              jsonString.Wait();

              var result =  JsonConvert.DeserializeObject<DiceApiJobWrapper>(jsonString.Result);
              if (result != null)
              {
                  if (result.DiceApiJobs.Any())
                      jobs = result.DiceApiJobs.ToList();

                  if (result.nextUrl != null)
                  {
                      //
                      // do this GetDiceJobs again in a loop? How?? Any other efficient elegant way??
                  }
              }
          });
        task.Wait();

        return jobs;
    }

But now, how do I check if there are more jobs using the nextUrl field? I know I can check to see if it's not null, and if if not, that means there are more jobs to pull down.

Results from my debugging and stepping through

How do I do this recursively, and without hanging and with some delays so I don't cross the API limits? I think I have to use TPL ( Task Parallel Library) but am quite baffled.

Thank you! ~Sean

SeanPatel
  • 156
  • 1
  • 1
  • 8
  • Before someone recommends a recursive route, do not do this. If you have enough pages you will get a stack overflow. I would put it in a loop (`while (nextUrl != null) { }`) and then assign nextUrl to the first one just before the while loop, etc. – ProgrammingLlama Apr 17 '17 at 05:27
  • Couldn't you just have a queue list and just loop it and add/pop as you iterate. Also need to track urls you've looked at to avoid scraping pages already scrapped. – Phill Apr 17 '17 at 05:29
  • @john Thanks, I agree it's bad, I changed the title to say `Loop`. @Phill I've heard about QueueList, can you elaborate, or provide a working code in Answer. I'm very new to TPL and async task library etc. – SeanPatel Apr 17 '17 at 05:31
  • @SeanPatel You are using DeserializeObject which map it to a c# class. If anything is missing in json it will break your code. – Anirudha Gupta Apr 17 '17 at 08:14
  • @AnirudhaGupta Oh. What's the recommended way then? Sorry, quite new to all this. Is there a different safer "defensive coding" approach? – SeanPatel Apr 18 '17 at 05:20
  • Instead of deserializing into an object, use dynamic and use null propagation to check x?.y?.z is not null. there is another way in settings to ignore missing property, you can use any of both. – Anirudha Gupta Apr 18 '17 at 05:24

1 Answers1

13

If you are concerned about response time of your app and would like to return some results before you actually get all pages/data from the API, you could run your process in a loop and also give it a callback method to execute as it gets each page of data from the API.

Here is a sample:

public class Program
{
    public static void Main(string[] args)
    {
        var jobs = GetDiceJobsAsync(Program.ResultCallBack).Result;
        Console.WriteLine($"\nAll {jobs.Count} jobs displayed");
        Console.ReadLine();
    }

    private static async Task<List<DiceApiJob>> GetDiceJobsAsync(Action<DiceApiJobWrapper> callBack = null)
    {
        var jobs = new List<DiceApiJob>();
        HttpClient httpClient = new HttpClient();
        httpClient.BaseAddress = new Uri("http://service.dice.com");
        var nextUrl = "/api/rest/jobsearch/v1/simple.json?skill=ruby";

        do
        {
            await httpClient.GetAsync(nextUrl)
                .ContinueWith(async (jobSearchTask) =>
                {
                    var response = await jobSearchTask;
                    if (response.IsSuccessStatusCode)
                    {
                        string jsonString = await response.Content.ReadAsStringAsync();
                        var result = JsonConvert.DeserializeObject<DiceApiJobWrapper>(jsonString);
                        if (result != null)
                        {
                            // Build the full list to return later after the loop.
                            if (result.DiceApiJobs.Any())
                                jobs.AddRange(result.DiceApiJobs.ToList());

                            // Run the callback method, passing the current page of data from the API.
                            if (callBack != null)
                                callBack(result);

                            // Get the URL for the next page
                            nextUrl = (result.nextUrl != null) ? result.nextUrl : string.Empty;
                        }
                    }
                    else
                    {
                        // End loop if we get an error response.
                        nextUrl = string.Empty;
                    }
                });                

        } while (!string.IsNullOrEmpty(nextUrl));
        return jobs;
    }


    private static void ResultCallBack(DiceApiJobWrapper jobSearchResult)
    {
        if (jobSearchResult != null && jobSearchResult.count > 0)
        {
            Console.WriteLine($"\nDisplaying jobs {jobSearchResult.firstDocument} to {jobSearchResult.lastDocument}");
            foreach (var job in jobSearchResult.DiceApiJobs)
            {
                Console.WriteLine(job.jobTitle);
                Console.WriteLine(job.company);
            }
        }
    }
}

Note that the above sample allows the callback method to access each page of data as it is received by the GetDiceJobsAsync method. In this case, the console, displays each page as it becomes available. If you do not want the callback option, you can simply pass nothing to GetDiceJobsAsync.

But the GetDiceJobsAsync also returns all the jobs when it completes. So you can choose to act on the whole list at the end of GetDiceJobsAsync.

As for reaching API limits, you can insert a small delay within the loop, right before you repeat the loop. But when I tried it, I did not encounter the API limiting my requests so I did not include it in the sample.

Frank Fajardo
  • 7,034
  • 1
  • 29
  • 47
  • 1
    This is beautiful. Works and was really fast! Thanks Frank! I accepted the answer, and upvoted you, but since I am so new, it says that my vote counts but does not appear or something. Hope you got the vote. Thanks so much!!! – SeanPatel Apr 18 '17 at 04:24