Why is it a common practice to encode pagination cursors or id values as string?

Question

For example the Facebook Graph API: why are after and before base64 encoded numbers?

{
  "data": [
     ... Endpoint data is here
  ],
  "paging": {
    "cursors": {
      "after": "MTAxNTExOTQ1MjAwNzI5NDE=",
      "before": "NDMyNzQyODI3OTQw"
    },
    "previous": "https://graph.facebook.com/me/albums?limit=25&before=NDMyNzQyODI3OTQw"
    "next": "https://graph.facebook.com/me/albums?limit=25&after=MTAxNTExOTQ1MjAwNzI5NDE="
  }
}

What benefits could it possibly bring in contrast to just plain numbers?

As the following python log shows, the benefits can not be shorter representation of the data or the data containing unsafe characters:

>>> base64.b64decode("MTAxNTExOTQ1MjAwNzI5NDE=")
'10151194520072941'
>>> len('10151194520072941')
17
>>> len("MTAxNTExOTQ1MjAwNzI5NDE=")
24

Base 64 would only make sense as a "compacting" mechanism if you were encoding large 64 bit numbers to a printable representation. Eg. ``len('\x00\x00\x00\x00\x00\x00\x00\x01'.encode('base64')) = 13`` and ``len('72057594037927936') = 17``. The only reason I see for that decision is to offer a more opaque interface. It's saying "don't rely on this number to be something meaningful". — Rafael Almeida, Jun 27 '15 at 19:30

score 5 · Answer 1 · edited May 23 '17 at 10:31

Highest possible number in JavaScript is 9007199254740992 according to question asked in StackOverflow What is JavaScript's highest integer value that a Number can go to without losing precision?

If you compare these values

9007199254740992    // the JS maximum
10151194520072941   // the Base64 encoded number

If certainly looks like Facebook is internally - for reasons we do not know - storing values which are too big for JavaScript number precision to handle.

Thus, it seems to me that they had no other possibility than to handle numbers as strings.

Of course they could just use "10151194520072941" as number in string format but some programmers might confuse this to be a number. Even though this is rarely happening, they probably thought that Base64 encoding the number avoids the problem of someone converting the string to integer.

Furthermore, because this is Public API function, it is not used by their own engineers, so the risk is even higher, because persons using the API come from different educational backgrounds. They could accidentally use for example parseInt or similar to the number resulting in unnecessary customer service requests.

EDIT: Using very big numbers might also serve another purpose: detecting purposeful abuse of the API. If they would be using for example random UUID values or consecutive numeric values, any close-by value could be potentially legal. If it is an UUID they first must make the request to see if it is a legal entry. Having a big number base it could be that only every 1000th is legal or they follow some other mathematical rule which can be detected by a single server, without requests to other server, sorting out clients which are purposefully crafting requests with illegal values becomes much more effective and perhaps can be filtered out before they reach the databases.

I'm not so sure if I agree with the abuse detection, but the maximum integer argumentation alone is already a really good reason! Thanks for your answer! — stefreak, Jun 28 '15 at 14:18
i think the big number argumentation is especially good when you consider that some languages convert strings to numbers automatically and that you only notice it when the numbers you are dealing with are higher than the maximum, so the bug could happen rarely or depending on the devices in use — stefreak, Jun 28 '15 at 14:24
Yes, I was just thinking what might be the reason to use numbers instead of more common UUID - it could be that they have noticed people are trying to hack the system by trying to "guess" the next correct ID - making more space to the number space would create "traps" where the guesses would fall and these holes might even be systematic - much harder to argue that it was accident if the next number was base64 encoded, and also to detect if it was not encoded - but I can remove that speculation from the answer if you wish — Tero Tolonen, Jun 28 '15 at 14:24

score 3 · Answer 2 · edited May 23 '17 at 11:46

If you mean using base 10 (decimal) when you say plain numbers, then the advantages are that base 64 is more compact, using fewer digits (a 10 digit base 10 number (eg 1,000,000,000) can be expressed in only 5 digits in base 64 (eg F9eEA)), as well as (as you say) hiding the implementation details.

If you mean using the raw binary data when you say plain numbers, base 64 uses characters that are almost always safe to transmit across the internet, in URLs, etc. without having some characters interpreted as control characters (which is a risk when transmitting the raw binary data). see this other question for more information.

In either case there are advantages to using base64.

EDIT:

I see what you mean, the previously listed advantages do not apply in this case. Facebook probably used base64 for consistency with other API functions, as well as to hide the implementation details. It also may be advantageous if they modified it in the future to allow other characters, as well as tolerate potential malformed requests (assuming the error happened before base64 conversion).

This answer makes a lot of sense. It also means that FB could reserve certain non-numeric IDs or ID suffices for a variety of system reasons. — ChrisGuest, Jun 28 '15 at 09:32

score 3 · Answer 3 · answered Oct 30 '21 at 04:59

3

It's just for providing a consistent spec to consumers. That way, you can change the pagination scheme but consumers will always get a base64 string

answered Oct 30 '21 at 04:59

rodrigomd

171
1
3

score 1 · Accepted Answer · answered Oct 21 '22 at 12:30

This is a more general answer towards the "why encode cursors with base64" part of the question and not related to the Facebook Graph API.

I agree with Rafael Almeida and rodrigomd in that the intention behind the Base64 encoding is to hide implementation details and keep the cursor value compact.

This is kind of a standard practice.

The folks at Slack use Base64 encoded cursors because their cursor values(before encoding) look something like this:

user:W07QCRPA4

Base64 encoding this value keeps it compact and hides the implementation detail.

This also gives them the flexibility to use this same strategy to convert their offset-based paginated API services into cursor-based by using cursor values as:

offset:10

Clients won't know the difference because they don't need to.

this is a great reason to do this – stefreak Nov 29 '22 at 07:42 — stefreak, Nov 29 '22 at 07:42

Why is it a common practice to encode pagination cursors or id values as string?

4 Answers4