0

I have a python dictionary of character keys and integer values. I want to store arbitrarily large integers as values of my keys. What is the maximum integer I can store as a value of my key in the dictionary? Is it possible to define the datatype of the dictionary when we initialize it? For example in C, we do map<string, long long int>. What should I do to declare the same dictionary in python?

For example, in the following code, my value will grow immensely for many keys.

 for w in words:
     ngrams_w = self.word_to_ngrams(w)
        for n in ngrams_w:
            if n in lookup_table:
                lookup_table[n] = lookup_table[n] + 1
            else:
                lookup_table[n] = 1
     return lookup_table

Now I have so many words in my corpus, resulting in too many tri-grams. So, Will the values in my lookup table be able to hold the immensely large integers?

MFerguson
  • 1,739
  • 9
  • 17
  • 30
Ruchit Patel
  • 733
  • 1
  • 11
  • 26
  • 2
    did you even try anything or just come here to ask? – SuperStew Feb 11 '20 at 16:09
  • 1
    Use `sys.maxint` returns **The largest positive integer supported by Python’s regular integer type**. – Ch3steR Feb 11 '20 at 16:12
  • @SuperStew, I am storing the counts of letter [tri-grams](https://en.wikipedia.org/wiki/Trigram) in such dictionary. Now my corpus is really large, so value of counts will of course will be very high. So, when I do this on large corpus, the final dictionary I get has all the garbage keys/values. So I was wondering if there is any way to store these large counts in my dictionary. – Ruchit Patel Feb 11 '20 at 16:14
  • Possible duplicate of [Maximum and Minimum values for ints](https://stackoverflow.com/questions/7604966/maximum-and-minimum-values-for-ints) – b_c Feb 11 '20 at 16:14
  • @MikePatel it would be best if you posted your code, or a simplified version that exhibits the same behavior – SuperStew Feb 11 '20 at 16:16
  • @SuperStew, posted the code, have a look – Ruchit Patel Feb 11 '20 at 16:21
  • I don't understand the question. Did you try that and you got some kind of error? – Tomerikoo Feb 11 '20 at 16:23
  • @Tomerikoo, yes, my lookup table looks good for smaller number of documents, i.e. smaller number of words, but when I do this on like 20000 documents, my dictionary keys turn garbage characters, like chinese characters and hindi characters. In a way, I lost all of my keys on doing this on large number of words. – Ruchit Patel Feb 11 '20 at 16:25
  • 1
    @Ch3steR _"Use sys.maxint returns The largest positive integer supported by Python’s regular integer type."_ - Nope, it returns `AttributeError: module 'sys' has no attribute 'maxint'` ;) – marcelm Feb 11 '20 at 22:34
  • @marcelm look at this answer https://stackoverflow.com/questions/13795758/what-is-sys-maxint-in-python-3 in python 3 `sys.maxint` is replaced by `sys.maxsize`. – Ch3steR Feb 12 '20 at 03:59

1 Answers1

1

Python integers can get arbitrarily large - the amount of bytes they can take up is limited only by the available memory. As a result, you can store very large integers in dicts, just like any other type of variable.

Also note that python's types are behind the scenes. The same dict can use almost anything as a key, and anything as a corresponding value. You don't need to declare it in advance.

For example:

large_dict = {
   'a': 2**99**2,
   'b': 3**99**2,
   'c': 4**66**3,
   'd': 5,
}

print(large_dict['a'])
# 24830616513292456149616454036974739771820938966442197939419359658089567202400780743905571137028486156486036903513607264042124719153572110201314197883546916952215606391372422139042592773840794323335212159700095246665013394384789465765464293679828325113232950453141468484569985222217035575296458501452872186378717438026640856834533121910412608973480242085881672164719912544082874072107422434390554486837170594217552352179217838815153995983301946373496587090616156896354994345352377952726227907079690576457293694283595586693944067261016834086680506973471228878547284373711902581003346534526356682248040411471279547066667078693613059243075566078841931213128462480543351921983621089668275721435948101887626265184182951445127840725828830209648532962314294583129458462587616879683621002666459872557839185108646816817833525155517060333482860732993614665277625489415502061774379920496929429602780585502167079135606858316403456987934488806952870800041077801533837170751376374861251329800483795256882616805854878581164361680226869833419669385928257122731195573904839938024027782370149377710658645650280419249518679276644090004956974913792951678647959778713113374754349884308332520403967474904860378939613020574276565212696855826404212709820546399167642334178336428555942563152453083892489382421641854517289230831100263608004179956903465008511600241582320500793382331980382057875456793451053160479393167189744446997815108094874623772314046414880967888630771591139242793679368373241262303020403503806741495927267535782756256226464827679171453103936982984219779689528492725634758390287762884938541414982295293195059435820985171981953652126800809143528930914867233895403413631144957773050572055247562723249843368842707168818290935719822151262157837630513192298577647355462098479363787742148415773842359155842482339588691450667863720967596151398289089712023294064255959682719606426598201025203109875583565132643670912933398263577545142697715765285834571917043394759092957857788427530930354248022105571235052694964834870559303479169142265964947215835534331232659231738844605321566863791868404356910674470711233474169254178106049934124965008926126859691353260188518891725702238011978956872340932416194028852429366663313267993063866935339039988538965602564065890425970977641916796977494816019907846068887159081996367805130377827078940214763737954464021880895504378918548182271578043448659561293769460226209756884091241789271212624443566823618854822556818054950921281495213677124939344997301596883289597036555187701021490864067003001042353911716339896584757966357696031609808169050800581900702426733604379398309018563404211559969917703040652231206637777868534421416141767113165973592413599591681134708842927135950028572565259146431359982767414698579766187600152269764793150959658673104646366308004593390529802053032008317417654941763892718192450423219923744313363894743858416682079171193877076995395458797611449453319718948222265568320631084026777091936418337638425034752

These should qualify as 'arbitrarily large integers', given that trying to print them overflows the size of my terminal window, and large_dict['c'] displays fine despite consuming about six times StackOverflow's character limit. Note that things might break when you try to convert them to floating point numbers, because those have a precision limit.


Python does support 'type hinting', where you can imply the type of a variable, but it doesn't actually affect anything you can do with the code. Regardless, here's how you would notate this:

# need to import Dict to type-hint about it
from typing import Dict

# hint that large_dict should use strings as keys and ints as values
# note that a 'character' is just a 1-length string, as far as python is concerned
# note also that this will not stop someone from putting a non-string in as a key, 
#   or a non-int in as a value
large_dict: Dict[str, int] = {
    ...
}
Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • From your answer, it seems that I can store any integer I want to store in my dictionary values, but I don't know what goes wrong when I run above code on very large corpus, like corpus of billions of words. In that case, the letter trigram counts should go to a trillion at max, so, it should work, but I don't know what's wrong in my code? I think there should be some other bug, so I should better look at the code again. – Ruchit Patel Feb 11 '20 at 16:32
  • _In that case, the letter trigram counts should go to a trillion at max, so, it should work, but I don't know what's wrong in my code? I think there should be some other bug, so I should better look at the code again._ I'm not Green Cloak Guide, but yes, the size of the numbers is probably not the issue. – AMC Feb 11 '20 at 17:10