I am working with a dataset that has several columns that represent integer ID numbers (e.g. transactionId
and accountId
). These ID numbers are are often 12 digits long, which makes them too large to store as a 32 bit integer.
What's the best approach in a situation like this?
- Read the ID in as a character string.
- Read the ID as a integer64 using the bit64 package.
- Read the ID as a numeric (i.e. double).
I have been warned about the dangers of testing equality with doubles, but I'm not sure if that will be a problem in the context of using them as IDs, where I might merge and filter based on them, but never do arithmetic on the ID numbers.
Character strings seems intuitively like it should be slower to test for equality and do merges, but maybe in practice it doesn't make much of a difference.