2

I'm a hardware engineer with lots of firmware coding experience in C and assembly ("low level languages"), with some coding exposure to C++ and prelim knowledge with JS, Pascal, etc. ("high level languages").

I have hired a couple of developers to write a Web App that will later handle high volume in traffic. The app is written in .NET Core, PostgreSQL, Vue.js, C#, ... where my syntactical knowledge is zero so I cannot write test routines to prove/disprove anything so am requesting your help as follows --

The developers are contending that it makes virtually no difference in execution of speed whether they store 10-digit and 12-digit numbers in DB fields marked as "text" or as "int" (or long int). These numbers will be searched very frequently with potentially millions of entries, so speed differences will become apparent under heavy load.

The 10-digit numbers are mobile numbers that need to be displayed in the format 00000-00000 and 12-digit numbers are Indian Aadhaar card numbers to be shown as 0000 0000 0000, so instead of formatting these numbers in the front-end JavaScript code, they are storing them formatted as text along with the hyphen and spaces.

My understanding makes me believe that the search of a text like "12345-67890" in a text column containing will be very slow as compared to a search of a number 1234567890 (int) in a column containing ints.

Two reasons for my belief: a) A number is stored in fewer bytes than a string, consequently faster processing and lower storage

b) A text search uses ASCII stored in bytes whereas number searches directly in the bytes, besides the fact that SQL query gets converted to C which gets converted to Assembly, where the optimized machine code will be probably an order of magnitude faster in numerical searches.

Can any of you experts confirm please?

Thank you

4world
  • 21
  • 1
  • Question is a bit complex, not sure if anyone will answer in nearest time but i'll leave a few related links that may help a bit. Shortly - `Indexing` should level out the differences, if indexes fit in RAM. [related question](https://stackoverflow.com/questions/2346920/sql-select-speed-int-vs-varchar) [related benchmark](https://stackoverflow.com/questions/332300/is-there-a-real-performance-difference-between-int-and-varchar-primary-keys/48583244#48583244) – Sergey Sosunov Dec 23 '22 at 18:57
  • One possible issue is that the integer type will not store leading `0`'s, not sure if that is something that will come up here. – Adrian Klaver Dec 23 '22 at 19:07
  • No leading zeros will be needed in either of these. – 4world Dec 23 '22 at 19:10
  • One important thing to notice is that these values are _not_ numbers proper (addition and subtraction do not make sense with them) but rather code sequences that contain only digits. Keeping them formatted with spaces and dashes in the DB makes no sense either. Storing as numbers has a strong performance edge yet these are semantically not numbers. I would try to estimate the number of whole value searches vs. the number of text operations like 'extract the fifth and sixth digits' besides other domain concerns before making my mind. – Stefanov.sm Dec 23 '22 at 20:25
  • Easy to measure, just create a couple of million records and execute some queries – Frank Heikens Dec 23 '22 at 21:12

0 Answers0