tangled wires
You have a problem of mixing many concerns into one large function. I would recommend you break the tasks into separate parts and compose your "big" solution by combining many "small" solutions.
Write parseLink
to extract relevant links from the response header -
const parseLink = (link = "", rel = "") =>
{ const re = /<([^>]+)>; rel="([^"]+)"/g
let m
while (m = re.exec(link))
if (rel === m[2])
return m[1]
return undefined
}
const header =
`<https://api.github.com/repositories/711976/issues/comments?page=2>; rel="next", <https://api.github.com/repositories/711976/issues/comments?page=169>; rel="last"`
console.log(parseLink(header, "next"))
// https://api.github.com/repositories/711976/issues/comments?page=2
console.log(parseLink(header, "foo"))
// undefined
Use URLSearchParams to modify URL search params -
const href =
`https://api.github.com/repositories/711976/issues/comments?foo=bar`
const u = new URL(href)
u.searchParams.set("per_page", 100)
console.log(u.toString())
// https://api.github.com/repositories/711976/issues/comments?foo=bar&per_page=100
u.searchParams.set("foo", "hello")
console.log(u.toString())
// https://api.github.com/repositories/711976/issues/comments?foo=hello&per_page=100
But don't go tangling URL parameter code with fetching code. Use a separate function, such as urlWithParams
-
const urlWithParams = (href = "", params = {}) =>
{ const u = new URL(href)
for (const [k, v] of Object.entries(params))
u.searchParams.set(k, v)
return u.toString()
}
const href =
`https://api.github.com?foo=bar`
console.log(urlWithParams(href, { per_page: 100 }))
// https://api.github.com/?foo=bar&per_page=100
console.log(urlWithParams(href, { per_page: 100, page: 3 }))
// https://api.github.com/?foo=bar&per_page=100&page=3
console.log(urlWithParams(href, { per_page: 100, foo: "hello" }))
// https://api.github.com/?foo=hello&per_page=100
We can now write a generic fetchPage
which fetches a single page -
const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
const next = parseLink(headers.link, "next")
return { data, next }
}
const href =
`https://api.github.com/repositories/711976/issues/comments?foo=bar`
const result =
axios.get(urlWithParams(href, { per_page: 100 }))
result.then(console.log)
// { data: [ { ... }, { ... }, ... ]
// , next: `https://api.github.com/repositories/711976/issues/comments?foo=bar&per_page=100&page=2`
// }
And the generic fetchPages
which fetches all linked pages -
const fetchPages = async (href = "") =>
asyncUnfold
( async (then, done, { data, next }) =>
next
? then(data, await fetchPage(next))
: done(data)
, await fetchPage(href)
)
.then(pages => pages.flat(1)) // merge pages into single array
And finally, fetchComments
is a specialization of fetchPages
and urlWithParams
-
const commentsAPI =
`https://api.github.com/repositories/711976/issues/comments`
const fetchComments = (parameters = {}) =>
fetchPages(urlWithParams(commentsAPI, parameters))
const result =
fetchComments({ per_page: 100 })
result.then(console.log, console.error)
// [ { ... }, { ... }, ... ]
The generic asyncUnfold
is explained and implemented in here and here. Here is an implementation for reference -
const asyncUnfold = async (run, initState) =>
run
( async (value, nextState) => [ value, ...await asyncUnfold(run, nextState) ]
, async () => []
, initState
)
Verify it all works by expanding and running the snippet in your browser below. NB, the repo in your original question (711976) has over 50 pages of 100 comments each. For purposes of this demo, I am using a smaller repo (261279710) which only has 18 pages. Warning: if you run this demo multiple times, you will probably get rate-limited by the server because it's not using an API token -
const axios =
{ async get (href = "")
{ const r = await fetch(href)
return Object.assign
( { headers: { link: r.headers.get("link") } }
, { data: await r.json() }
)
}
}
const asyncUnfold = async (run, initState) =>
run
( async (value, nextState) => [ value, ...await asyncUnfold(run, nextState) ]
, async () => []
, initState
)
const parseLink = (link = "", rel = "") =>
{ const re = /<([^>]+)>; rel="([^"]+)"/g
let m
while (m = re.exec(link))
if (rel === m[2])
return m[1]
return undefined
}
const urlWithParams = (href = "", params = {}) =>
{ const u = new URL(href)
for (const [k, v] of Object.entries(params))
u.searchParams.set(k, v)
return u.toString()
}
const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
const next = parseLink(headers.link, "next")
return { data, next }
}
const fetchPages = async (href = "") =>
asyncUnfold
( async (then, done, { data, next }) =>
next
? then(data, await fetchPage(next))
: done(data)
, await fetchPage(href)
)
.then(pages => pages.flat(1))
const commentsAPI =
`https://api.github.com/repositories/261279710/issues/comments`
const fetchComments = (parameters = {}) =>
fetchPages(urlWithParams(commentsAPI, parameters))
const comments =
fetchComments({ per_page: 100 })
comments.then(console.log, console.error)
comments.then(r => console.log("total: %d", r.length))
<p>fetching ~20 pages of comments. please wait...</p>
<p><b>Warning:</b> running this demo multiple times may cause the remote server to rate-limit your client.</p>
// [ { ... }, { ... }, { ... }, ... ]
// total: 1840
make it fast!
Notice the program above fetches the pages in serial order. Did you also notice that the server also returns a rel="last"
to inform you the last page? With that in mind we could write something that fetches pages in parallel.
Let's imagine a page
function that returns the page param from a URL -
const page = (href = "") =>
// ...
console.log(page("/path/to/repo")) // 0
console.log(page("/path/to/repo?page=123")) // 123
And a next
function which takes a URL and gives us the URL for the next page -
const next = (href = "") =>
// ...
console.log(next("/path/to/repo")) // "/path/to/repo?page=1
console.log(page("/path/to/repo?page=123")) // "/path/to/repo?page=124
Using URL
and URLSearchParams
those should be easy for you to write! Then adjust fetchPage
so it returns the last
link in the response -
const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
const next = parseLink(headers.link, "next")
const last = parseLink(headers.link, "last")
return { data, next, last }
}
Now let's adjust fetchPages
to fetch in parallel -
const fetchPages = (href = "") =>
{ const first =
await fetchPage(href)
const pages =
unfold
( (then, done, href) =>
page(href) < page(first.last)
? then(href, next(href))
: done(href)
, first.next
)
const allData =
// parallel!
await Promise.all(pages.flatMap(p => fetchPage(p).then(r => r.data))))
return first.data.concat(allData)
}
Now your only limitation is concurrent connections