2

I'm trying to get data from a REST API with Power Query. The API returns results by pages of 200 records. Each page will contain a JSON structure such as this:

{
  "start": 0,
  "limit": 200,
  "records": [
    {
      "record": {
        "id": 1,
        "field1": "some text",
        "field2": false
      }
    },
    {
      "record": {
        "id": 2,
        "field1": "some text",
        "field2": true
      }
    },
    {
      "record": {
        "id": 3,
        "field1": "some text",
        "field2": false
      }
    }
  ]
}

Then for each next page, I supply a value for a start parameters which increments by steps of 200, and I just need to extend the records array with the content of the new page's array. So, in pseudo-code, it looks like this:

content ← True
i ← 1
While content
    response ← get(url, http_parameters=['start' = i])
    results ← results + response['records']
    i ← i + response['size']
    content ← size(response['records']) > 0
Endwhile

I do this trivially in Python, but I can't see how it's done in the M language, as it doesn't have for or while loops. I've seen numerous questions being asked about about this topic, but every time it assumes that either the first response will return the total number of records, or that it contains the URL to the next page.

The learning curve is quite steep with this language, so I'd be grateful for any hint to put me in the right direction.

mrgou
  • 1,576
  • 2
  • 21
  • 45

3 Answers3

3

The standard starting point is to tweak the Microsoft helper function to your needs.

There are also some good blogs on the topic. The recursion / iteration did my head in a bit until I figured it our and then it ran sweet.

// The getNextPage function takes a single argument and is expected to return a nullable table
Table.GenerateByPage = (getNextPage as function) as table =>
    let        
        listOfPages = List.Generate(
            () => getNextPage(null),            // get the first page of data
            (lastPage) => lastPage <> null,     // stop when the function returns null
            (lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
        ),
        // concatenate the pages together
        tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
        firstRow = tableOfPages{0}?
    in
        // if we didn't get back any pages of data, return an empty table
        // otherwise set the table type based on the columns of the first page
        if (firstRow = null) then
            Table.FromRows({})
        else        
            Value.ReplaceType(
                Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),
                Value.Type(firstRow[Column1])
            );
Murray Foxcroft
  • 12,785
  • 7
  • 58
  • 86
  • Awesome, thanks! I'll need to read through the PowerBI connector documentation to learn how to use this function, so it might take me a couple of days before I can report on the outcome. – mrgou Apr 01 '21 at 08:20
  • Good luck - hope you get it sorted out soon. – Murray Foxcroft Apr 01 '21 at 09:04
2

I got this solution proposed by one of the community response and I tried the code to check the function response same as the JIRA API results in the total count.

Two issues I faced:

  1. As the number of response tickets is higher it takes a long time for the query to return, and not work asynchronously.
  2. The second issue is something if I can seek advice from experts here in Power Query. I cant refresh the paginated data outside the PowerBI desktop app, after I make it published and use the Dataset refresh and see the refresh history from the schedule screen. I am getting the following error.

This dataset includes a dynamic data source. Since dynamic data sources aren't refreshed in the Power BI service, this dataset won't be refreshed. Learn more: https://aka.ms/dynamic-data-sources.

In some research I found this about a solution mentioning for pagination but still, the problem with huge data remains that refresh gives the above error.

I read about some RelativeDocument and M functions but am not savvy for that understanding.

Here is my Power Query snippet which I request can be suggested to be modified with anything I am missing.

   let 
    BaseUrl = "https://<base URL>/rest/api/2/search?jql=filter=22344",

    JiraIDPerPage = 400,
 
    GetJson = (Url) =>
        let 
            RawData = Web.Contents(Url,([Headers=[Authorization="Bearer TOKEN"]])),
            Json    = Json.Document(RawData)
        in  Json,
 
    GetJiraIDCount = () =>
        let Url   = BaseUrl & "&maxResults=0",
            Json  = GetJson(Url),
            Count = Json[#"total"]
        in  Count,
 
    GetPage = (Index) =>
        let Skip  = "&startAt=" & Text.From(Index * JiraIDPerPage),
            Top   = "&maxResults=" & Text.From(JiraIDPerPage),
            Url   = BaseUrl & Skip & Top,
            Json  = GetJson(Url),
            Value = Json[#"issues"]
        in  Value,
 
    JiraIDCount = List.Max({ JiraIDPerPage, GetJiraIDCount() }),
    PageCount   = Number.RoundUp(JiraIDCount / JiraIDPerPage),
    PageIndices = { 0 .. PageCount - 1 },
    Pages       = List.Transform(PageIndices, each GetPage(_)),
    JiraID    = List.Union(Pages),
    Table       = Table.FromList(JiraID, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(Table, "Column1", {"id", "key", "fields"}, {"Column1.id", "Column1.key", "Column1.fields"}),
    #"Expanded Column1.fields1" = Table.ExpandRecordColumn(#"Expanded Column1", "Column1.fields", {"summary", "reporter", "assignee", "created", "status"}, {"Column1.fields.summary", "Column1.fields.reporter", "Column1.fields.assignee", "Column1.fields.created", "Column1.fields.status"}),
    #"Expanded Column1.fields.assignee" = Table.ExpandRecordColumn(#"Expanded Column1.fields1", "Column1.fields.assignee", {"name"}, {"Column1.fields.assignee.name"}),
    #"Expanded Column1.fields.reporter" = Table.ExpandRecordColumn(#"Expanded Column1.fields.assignee", "Column1.fields.reporter", {"displayName"}, {"Column1.fields.reporter.displayName"}),
    #"Expanded Column1.fields.status" = Table.ExpandRecordColumn(#"Expanded Column1.fields.reporter", "Column1.fields.status", {"name"}, {"Column1.fields.status.name"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Column1.fields.status",{{"Column1.fields.created", type datetimezone}})
in
    #"Changed Type"
codelover
  • 1,113
  • 10
  • 28
1

The standard starting point is to tweak the Microsoft helper function to your needs.

// The getNextPage function takes a single argument and is expected to return a nullable table
Table.GenerateByPage = (getNextPage as function) as table =>
    let        
        listOfPages = List.Generate(
            () => getNextPage(null),            // get the first page of data
            (lastPage) => lastPage <> null,     // stop when the function returns null
            (lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
        ),
        // concatenate the pages together
        tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
        firstRow = tableOfPages{0}?
    in
        // if we didn't get back any pages of data, return an empty table
        // otherwise set the table type based on the columns of the first page
        if (firstRow = null) then
            Table.FromRows({})
        else        
            Value.ReplaceType(
                Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),
                Value.Type(firstRow[Column1])
            );
Murray Foxcroft
  • 12,785
  • 7
  • 58
  • 86