60

I am running the exact same code on both windows and mac, with python 3.5 64 bit.

On windows, it looks like this:

>>> import numpy as np
>>> preds = np.zeros((1, 3), dtype=int)
>>> p = [6802256107, 5017549029, 3745804973]
>>> preds[0] = p
Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    preds[0] = p
OverflowError: Python int too large to convert to C long

However, this code works fine on my mac. Could anyone help explain why or give a solution for the code on windows? Thanks so much!

packybear
  • 617
  • 1
  • 6
  • 8
  • 1
    You're sure both are 64 bit? can you test on linux? – Tim Jul 11 '16 at 18:53
  • Even if both systems are on 64-bit Python, are they both on 64-bit NumPy? – user2357112 Jul 11 '16 at 18:53
  • Another stackoverflow question explains 'why'. On Windows long is 32bit and on Unux-like long is 64bit. Please see the question http://stackoverflow.com/questions/384502/what-is-the-bit-size-of-long-on-64-bit-windows – VladimirM Jul 11 '16 at 18:57
  • 7
    Use `dtype='int64'` or `dtype=np.int64`. The `int` type uses a C `long`, which is always 32-bit on Windows. – Eryk Sun Jul 11 '16 at 19:34
  • to Tim: Yes, both are 64bit. I do not have a linux machine, sorry. to user2357112: Yes, both are 64bit python and numpy. to VladimirM: Thanks! I think that question answers mine! to eryksun: Thanks! It works! – packybear Jul 11 '16 at 21:39
  • How would you do this without numpy? – jtlz2 Jul 10 '19 at 14:01
  • This is a good solution https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072 – Amitku Sep 24 '22 at 06:44

5 Answers5

40

You can use dtype=np.int64 instead of dtype=int

Francesco Mantovani
  • 10,216
  • 13
  • 73
  • 113
sammy ongaya
  • 1,313
  • 1
  • 15
  • 14
  • Thanks, I just had to use the unsigned type `np.uint64` (to store hashes). – Axel Puig May 01 '20 at 13:21
  • 3
    I tried using both `np.int64` and `np.uint64` to store 109323892912381287389218291378123872293293923929392929289283928 Neither work – information_interchange Jul 14 '20 at 21:41
  • 1
    If you need to store insanely large numbers exactly then numpy probablly isn't the tool for you. If you can tolerate loss of precision then you can use the float type, alternatively it's possible to have a numpy array of python objects ( https://stackoverflow.com/questions/6141853/numpy-array-of-python-objects ) but at that point some would question why you are using a numpy array at all. – plugwash Jan 21 '21 at 14:11
39

You'll get that error once your numbers are greater than sys.maxsize:

>>> p = [sys.maxsize]
>>> preds[0] = p
>>> p = [sys.maxsize+1]
>>> preds[0] = p
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

You can confirm this by checking:

>>> import sys
>>> sys.maxsize
2147483647

To take numbers with larger precision, don't pass an int type which uses a bounded C integer behind the scenes. Use the default float:

>>> preds = np.zeros((1, 3))
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
  • 3
    if you do get a number larger than this, how to tackle? – Veronica Cheng Feb 03 '18 at 17:51
  • 4
    @VeronicaWenqianCheng Don't pass an int dtype, use the default float. – Moses Koledoye Feb 05 '18 at 11:20
  • 3
    what if it needs to passed as an index which then needs to be int ? – fireball.1 Jul 27 '18 at 00:15
  • 1
    I don't understand your question clearly. The index or the value itself? In the case of the value, use a float. You can easily convert to int in plain Python if you need the value as an int. – Moses Koledoye Jul 29 '18 at 00:07
  • 2
    MosesKoledoye I think @fireball means what if a non-float is required as an index argument and hence cannot be a float (which you say is required to circumvent this problem)? Should one do int(float(x)) - surely not? – jtlz2 Jul 10 '19 at 14:02
  • e.g. `TypeError: integer argument expected, got float` – jtlz2 Jul 10 '19 at 14:04
  • @jtlz2 Maybe show some code. I can't see why a float is being passed when an int is required. – Moses Koledoye Jul 11 '19 at 21:18
  • 1
    At least according to the documentation sys.maxsize is the maximum value of py_ssize_t (essentially ssize_t), not long. In particular win64 has a 64-bit size_t, but a 32-bit long. – plugwash Dec 22 '19 at 17:24
  • 1
    With enormous ints, they are very likely to be id's of sorts. Doesn't converting them to floats mean that the digits will be truncated, thus breaking the uniqueness of the ids? – Cliff AB Jul 13 '20 at 18:57
7

Could anyone help explain why

Numpy arrays normally* have fixed size elements, including integers of various sizes, single or double precision floating point numbers, fixed length byte and Unicode strings and structures built up from the aforementioned types.

In Python 2 a python "int" was equivalent to a C long. In Python 3 an "int" is an arbitrary precision type but numpy still uses "int" it to represent the C type "long" when creating arrays.

The size of a C long is platform dependent. On windows it is always 32-bit. On unix-like systems it is normally 32 bit on 32 bit systems and 64 bit on 64 bit systems.

or give a solution for the code on windows? Thanks so much!

Choose a data type whose size is not platform dependent. You can find the list at https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#arrays-scalars-built-in the most sensible choice would probably be np.int64

* Numpy does allow arrays of python objects, but I don't think they are widely used.

plugwash
  • 9,724
  • 2
  • 38
  • 51
2

Convert to float:

import pandas as pd

df = pd.DataFrame()
l_var_l = [8258255190131389999999000003296, 50661]
df['temp'] = l_var_l
df['temp'] = df['temp'].astype(int)

Above fails with error:

OverflowError: Python int too large to convert to C long.

Now try with float conversion:

df['temp'] = df['temp'].astype(float)
Tonechas
  • 13,398
  • 16
  • 46
  • 80
2

I got the same error while trying to convert a object type column (actually string) to integer type.

This DID NOT work:

df['var1'] = df['var1'].astype(int)

This worked:

df['var1'] = df['var1'].apply(lambda x: int(x))