I have a Pandas DataFrame on which I would like to do some manipulations. First I sort my dataframe on the entropy using this code:
entropy_dataframe.sort_values(by='entropy',inplace=True,ascending=False)
This gives me the following dataframe (<class 'pandas.core.frame.DataFrame'>
):
entropy identifier
486 1.000000 3.955030e+09
584 1.000000 8.526030e+09
397 1.000000 5.623020e+09
819 0.999700 1.678030e+09
.. ... ...
179 0.000000 3.724020e+09
766 0.000000 6.163020e+09
770 0.000000 6.163020e+09
462 0.000000 7.005020e+09
135 0.000000 3.069001e+09
Now I would like to select the 10 largest identifiers and return a list with the corresponding 10 identifiers (as integers). I have tried selecting the top 10 identifiers by either using:
entropy_top10 = entropy_dataframe.head(10)['identifier']
And:
entropy_top10 = entropy_dataframe[:10]
entropy_top10 = entropy_top10['identifier']
Which both give the following result (<class 'pandas.core.series.Series'>
):
397 2.623020e+09
823 8.678030e+09
584 2.526030e+09
486 7.955030e+09
396 2.623020e+09
555 9.768020e+09
492 7.955030e+09
850 9.606020e+09
159 2.785020e+09
745 4.609030e+09
Name: identifier, dtype: float64
Even though both work, the pain starts after this operation as I now would like to change this Pandas Series with dtype float64 to a list of integers.
I have tried the following:
entropy_top10= np.array(entropy_top10,dtype=pd.Series)
entropy_top10= entropy_top10.astype(np.int64)
entropy_top10= entropy_top10.tolist()
Which results in (<type 'list'>
):
[7955032207L, 8613030044L, 2623057011L, 2526030291L, 7951030016L, 2623020357L, 9768028572L, 9606023013L, 2785021210L, 9768023351L]
Which is a list of longs (while I'm looking for integers).
Anyone that can help me out here? Thanks in advance!
--- EDIT ---
The problem lies 'here'. When I remove entropy_top10= entropy_top10.tolist()
, it results in a <type 'numpy.ndarray'>
with elements of dtype numpy.int64
. When I add the code again, I get a <type 'list'>
with elements long
.