0

I have a Pandas DataFrame on which I would like to do some manipulations. First I sort my dataframe on the entropy using this code:

entropy_dataframe.sort_values(by='entropy',inplace=True,ascending=False)

This gives me the following dataframe (<class 'pandas.core.frame.DataFrame'>):

      entropy    identifier
486  1.000000  3.955030e+09
584  1.000000  8.526030e+09
397  1.000000  5.623020e+09
819  0.999700  1.678030e+09
..        ...           ...
179  0.000000  3.724020e+09
766  0.000000  6.163020e+09
770  0.000000  6.163020e+09
462  0.000000  7.005020e+09
135  0.000000  3.069001e+09

Now I would like to select the 10 largest identifiers and return a list with the corresponding 10 identifiers (as integers). I have tried selecting the top 10 identifiers by either using:

entropy_top10 = entropy_dataframe.head(10)['identifier']

And:

entropy_top10 = entropy_dataframe[:10]
entropy_top10 = entropy_top10['identifier']

Which both give the following result (<class 'pandas.core.series.Series'>):

397    2.623020e+09
823    8.678030e+09
584    2.526030e+09
486    7.955030e+09
396    2.623020e+09
555    9.768020e+09
492    7.955030e+09
850    9.606020e+09
159    2.785020e+09
745    4.609030e+09
Name: identifier, dtype: float64

Even though both work, the pain starts after this operation as I now would like to change this Pandas Series with dtype float64 to a list of integers.

I have tried the following:

entropy_top10= np.array(entropy_top10,dtype=pd.Series)
entropy_top10= entropy_top10.astype(np.int64)
entropy_top10= entropy_top10.tolist()

Which results in (<type 'list'>):

[7955032207L, 8613030044L, 2623057011L, 2526030291L, 7951030016L, 2623020357L, 9768028572L, 9606023013L, 2785021210L, 9768023351L]

Which is a list of longs (while I'm looking for integers).

Anyone that can help me out here? Thanks in advance!

--- EDIT ---

The problem lies 'here'. When I remove entropy_top10= entropy_top10.tolist(), it results in a <type 'numpy.ndarray'> with elements of dtype numpy.int64. When I add the code again, I get a <type 'list'> with elements long.

Tomas
  • 315
  • 1
  • 3
  • 13
  • 1
    What version of python are you using? Are you sure, that regular integers would be large enough to hold the values? See http://stackoverflow.com/questions/7604966/maximum-and-minimum-values-for-ints -- in 32bit python, the maximum integer value should be 2147483647 – jbndlr Jul 12 '16 at 09:08
  • If I do `sys.maxint` I get 2147483647. I'm fairly sure that all identifiers have a maximum of 10 characters. If I try `python -V` in my command line it gives me Python 2.7.11 :: Anaconda 4.0.0 (64-bit). – Tomas Jul 12 '16 at 09:22
  • According to your `sys.maxint`, you run 32bit python. And even if numbers in your list have a maximum length of 10 digits, they may be larger than your `maxint`. Already the first value in your list `7955032207` does not fit into a 32bit integer. Thus, you will have to use `long`, as python already did. – jbndlr Jul 12 '16 at 09:33
  • Okay, that makes sense. One remark though. In another part of my code I also have a list consisting of values which have numpy's `int64` datatype. In this specific list Python is able to store these identifiers (also values larger than the `sys.maxint`) as `integer` instead of `long`. Any idea why it is possible there? – Tomas Jul 12 '16 at 09:40
  • In fact, when I remove `entropy_top10= entropy_top10.tolist()` from the method in my original question I get an `numpy.ndarray` which does contain elements from the datatype `numpy.int64`. Hence, when performing transforming this `numpy.ndarray` into a `list`, the elements are also transformed from an `numpy.int64` to a `long`. Is it still clear or should I adjust my original question? – Tomas Jul 12 '16 at 09:48

1 Answers1

2

Since users may not skim through all of the comments on your original question, I'll condense our results into a single answer.

  • According to sys.maxint, a 32bit version of python is running. Since some list elements are larger than maxint (2**31 - 1), the elements are stored as long values

  • The transformation entropy_top10.astype(np.int64) creates a numpy.ndarray of 64bit integers in numpy's own data type. numpy ships a 64bit integer data type even for 32bit python (which is no python native type whatsoever!).

  • The transformation entropy_top10.tolist() converts the numpy data type back to python's native data type. Since you are running 32bit, the int64 can only be convertet to long type

  • For a 64bit python version, the tolist() transformation would most likely result in python native integer types, because the values would fit into the regular integer at 64bit (2**63 - 1)

The reason for your list containing long items is the translation between numpy datatypes and native datatypes of your installed python version. Independent from the actual python version that is used to run code, numpy is consistent in its own datatypes.

Edit

To make the difference between the list's type and the items' types clearer, see this code example:

a = np.array([3123123123, 1512451234], dtype=np.int64)
print('ALL NUMPY')
print('  List items', a)
print('  List type', type(a))
print('  Item type', type(a[0]))

l = a.tolist()
print('ALL PYTHON NATIVE')
print('  List items', l)
print('  List type', type(l))
print('  Item type', type(l[0]))

c = [i for i in a]
print('NATIVE LIST, NUMPY TYPE')
print('  List items', c)
print('  List type', type(c))
print('  Item type', type(c[0]))

It gives the following output:

ALL NUMPY
  List items [3123123123 1512451234]
  List type <type 'numpy.ndarray'>
  Item type <type 'numpy.int64'>
ALL PYTHON NATIVE
  List items [3123123123L, 1512451234L]
  List type <type 'list'>
  Item type <type 'long'>
NATIVE LIST, NUMPY TYPE
  List items [3123123123, 1512451234]
  List type <type 'list'>
  Item type <type 'numpy.int64'>

From this output, we can learn, that numpy's tolist() function does not only convert the list from numpy.ndarray to list but also transforms all items' types from numpy.int64 to long. Manually transforming the array into a native list (using a comprehension here) yields a python native list with elements of type numpy.int64.

jbndlr
  • 4,965
  • 2
  • 21
  • 31
  • In some other parts of my code I was able to produce lists with integers from the same identifiers. See for example the following: `[3368030009, 6191090062, 8486030004, 7859030003, 4562030005, 8343090057, 2959090000, 7155090021, 9615030065, 6513030004]` Type of object: `` Type of elements of list: `` – Tomas Jul 12 '16 at 12:09
  • The difference is that the elements of your list are `numpy.int64`, which is the `type` for 64bit integers that `numpy` ships. These are no *native python* integers, since 32bit python **does not have** 64 bit integers. – jbndlr Jul 12 '16 at 12:16