0

I have a list of tweet id(with none values) in an excel file

ID
1258125182063050753
1233371388620263429
1237667024618258432
1225204912755179521
nan
nan

When I try to load the excel file in Pandas datagram, I convert the string column in interfere using

df['ID']=df['ID'].apply(np.int64)

I am getting the values as

 1258125182063050752
 1233371388620263424
 1237667024618258432
 1225204912755179520
 0
 0

The conversion of string to integer changes the numeric values. How can I rectify the issue?

  • Do you mean changing the zeros (0) back to NaNs? Check this: [Python Pandas replace multiple columns zero to Nan] (https://stackoverflow.com/questions/45416684/python-pandas-replace-multiple-columns-zero-to-nan) – naccode Jul 12 '20 at 14:27
  • @naccode no, my concern is 1258125182063050753 becomes 1258125182063050752 after the conversion – SUSHMA KUMARI Jul 12 '20 at 14:38
  • why do you convert to integer ? you don't have to make calculations on these numbers so keep them as strings. – furas Jul 12 '20 at 15:38
  • BTW: did you check values in Pandas before converting? Maybe you already have wrong data. Did you load directly from Excel or from CSV? Did you check CSV in text editor to see if you have correct values in file ? – furas Jul 12 '20 at 15:43

1 Answers1

0

Maybe there is an issue of number precision in Excel file, because when checking in pure python I see conversion doesn't change numbers:

df['ID2'] = df['ID'].apply(np.int64)
df['ID2'] == df['ID']

0    True
1    True
2    True
3    True
dtype: bool
ipj
  • 3,488
  • 1
  • 14
  • 18
  • maybe, but how to deal with the problem of excel/csv? – SUSHMA KUMARI Jul 12 '20 at 14:38
  • To reproduce problem the Excel file is needed and code showing the way You are importing data into dataframe. Also consoder that what You ca see in excel depends on formating options aplied to cells in worksheet. – ipj Jul 12 '20 at 15:04