1

I tried many SO answers but nothing tried so far worked.I have a column in a dfwhere there is a large value in a column like 89898989898989898 Whatever I do this is not being displayed as a number. All the cols are of float64 dtype.I do not have any float values in my df After creating pivot I get a dataframe df and I tried converting to int then writing to excel, does not seem to have any difference and displays as scientific formatting(I can see the value when I click on the cell in the value bar) I could not convert to int directly as there are Nan values in the column :-

  ID CATEG   LEVEL     COLS    VALUE  COMMENTS
    1   A        2     Apple    1e+13 comment1
    1   A        3     Apple    1e+13 comment1
    1   C        1     Apple    1e+13 comment1
    1   C        2     Apple    345   comment1
    1   C        3     Apple    289   comment1
    1   B        1     Apple    712   comment1
    1   B        2     Apple    1e+13 comment1
    2   B        3     Apple    376   comment1
    2   C        None  Orange   1e+13 comment1
    2   B        None  Orange   135   comment1
    2   D        None  Orange   423   comment1
    2   A        None  Orange   866   comment1
    2            None  Orange   496   comment2

After pivot the Apple column looks like this (providing just sample values to show the scientific notation values) :-

   index    Apple
    1655    1e+13
    1656    1e+13
    1657    1e+13
    1658    NaN
    1659    NaN
    df=pd.pivot_table(dfe,index=['ID','CATEG','LEVEL'],columns=['COLS'],values=['VALUE'])
    df= df.fillna(0).astype(np.int64)
    with pd.ExcelWriter('file.xlsx',options={'nan_inf_to_errors': True}) as writer :
            df.groupby('ID').apply(lambda x: x.dropna(how='all', axis=1).to_excel(writer,sheet_name=str(x.name),na_rep=0,index=True))
    writer.save()

What should I do to get rid of scientific formatting and get it displayed as a number in excel. Also is there a way to autofit the columns in excel while writing from python to excel.Im using pd.ExcelWriter to write to excel

Scope
  • 727
  • 4
  • 15
  • The problem is the category that excel assigns to the cells. If I put your example number into excel I also get scientific notation. But when I go to format cell --> Number and I pick number it changes it to a regular number. – Lukas S Oct 14 '20 at 13:01
  • No I don't think so as other columns from `df` are displayed as numbers correctly in excel – Scope Oct 14 '20 at 13:05

2 Answers2

0

You can use set_option to avoid scientific formating

pd.set_option('display.float_format', lambda x: '%.3f' % x)

Also you can change column type with this command:

df['col'] = df[['col']].fillna(0)
df['col'] = df['col'].astype(int)
Mehdi Golzadeh
  • 2,594
  • 1
  • 16
  • 28
  • I cannot convert into `int` as I have `NaN` value as mentioned in my question, also tried to convert `NaN` to 0 then convert to `int`, I still get the error – Scope Oct 14 '20 at 13:07
  • You fill na with 0 and use astype int but still have error? can you check my second solution and let me know? if it doesnt work please upload part of your dataset so I can check – Mehdi Golzadeh Oct 14 '20 at 13:15
  • YEs I have tried these before, I cannot try on one particular column as I have 78 columsn in total,so as I have posted in my question I have tried and yet issue was not resolved – Scope Oct 14 '20 at 13:17
  • `df= df.fillna(0).astype(np.int64)` – Scope Oct 14 '20 at 13:18
0

The following code generates a Dataframe with two columns. One with text and one with numbers and writes them to excel and changes the format so in order for excel to display the numbers as dot separated integers rather than in scientific notation. If you prefer a different format I am sure you can adopt the format string in the code. I hope that's what you're looking for :).

col1 = [float("nan"),*np.random.randint(10**15,size=5)]
col2 = [random.choice(["apple", "orange"]) for _ in range (6)]

with pd.ExcelWriter("pandas_column_formats.xlsx", engine='xlsxwriter') as writer:
    pd.DataFrame(data={"numbers":col1, "strings":col2})\
        .to_excel(writer)
    
    workbook  = writer.book
    worksheet = writer.sheets['Sheet1']
    format_ = workbook.add_format({'num_format': '#,##'})
    worksheet.set_column('B:B', None, format_)

For your specific example try:

with pd.ExcelWriter('file.xlsx',options={'nan_inf_to_errors': True}) as writer:
        df.groupby('ID').apply(lambda x: x.dropna(how='all', axis=1).to_excel(writer,sheet_name=str(x.name),na_rep=0,index=True))
    workbook  = writer.book
    worksheet = writer.sheets['Sheet1']
    format_ = workbook.add_format({'num_format': '#,##'})
    worksheet.set_column('C:C', None, format_)
    worksheet.set_column('E:E', None, format_)

Notice that I am guessing form the other thread that your columns with numbers will be C and E. You will probably have to widen the columns for the huge numbers. Excel will write ######### for it otherwise.

Lukas S
  • 3,212
  • 2
  • 13
  • 25
  • `data` will be my `dataframe`? – Scope Oct 14 '20 at 13:21
  • Data was data I have tried it on :D. But you can use `.style.format` on any `dataframe`. If you only want to `format` some columns you can do that by giving a key word argument `subset=a_list_of_columns_to_style`. Or you provide an example dataframe for me to work on. – Lukas S Oct 14 '20 at 13:25
  • I used `df=df.style.format(lambda x: "nan" if np.isnan(x) else int(x))` but did not work.(I had to use `groupby` next step). You can see for an example dataset and the code I use [here](https://stackoverflow.com/questions/64349629/how-to-drop-null-values-in-dataframe-after-groupby-while-writing-to-excel/64349685#64349685). You need to replace `VALUE` column with float `dtype` and `1e+13` notation value – Scope Oct 14 '20 at 13:30
  • Tried this `pd.DataFrame(data=df).style.format(lambda x: "nan" if np.isnan(x) else int(x))` , issue is not resolved – Scope Oct 14 '20 at 13:35
  • You have to do the `style.format` at then end right before you export to excel as it does not return a dataframe but a `styler object` – Lukas S Oct 14 '20 at 13:38
  • ok then could you show me where to add it here `with pd.ExcelWriter('file.xlsx',options={'nan_inf_to_errors': True}) as writer : df.groupby('ID').apply(lambda x: x.dropna(how='all', axis=1).to_excel(writer,sheet_name=str(x.name),na_rep=0,index=True)) writer.save()` – Scope Oct 14 '20 at 13:39
  • I would guess right before the `to_excel` but I can't test it so I am not sure. – Lukas S Oct 14 '20 at 13:40
  • Hmm maybe try to provide a _minimal viable example_ to you question. – Lukas S Oct 14 '20 at 13:45
  • I have already provided a sample along with my question and also u can have a look at it [here](https://stackoverflow.com/questions/64349629/how-to-drop-null-values-in-dataframe-after-groupby-while-writing-to-excel/64349685#64349685) – Scope Oct 14 '20 at 13:46
  • Sorry but you did not. A _minimal viable example_ means you give me something small that I can work on and you're happy when it worked with that. Maybe somebody else can help without more information. – Lukas S Oct 14 '20 at 13:52
  • I had provided more than enough information with rergards to scientific notation error, but as you requested for a "sample", provided that as well in the question, u can basically take any dataframe with scientific notation and try the code I have provided according to your column names to see or test if your code works – Scope Oct 14 '20 at 14:12
  • @Scope I changed my answer again. Now happy :D? – Lukas S Oct 14 '20 at 15:42
  • How do I make this work in my code (which i again already provided in question), I have `groupedby` and written to different sheets – Scope Oct 14 '20 at 15:52
  • `workbook = writer.book worksheet = writer.sheets['Sheet1'] format_ = workbook.add_format({'num_format': '#,##'}) worksheet.set_column('B:B', None, format_)` If I just need to add this block of code to make it work then in `writer.sheets` I need to all all the sheets .. – Scope Oct 14 '20 at 15:54
  • @Scope Hahaha check it one more time. – Lukas S Oct 14 '20 at 16:53
  • By `worksheet = writer.sheets['Sheet1']` all those formatting will be affected only to sheet 1 right – Scope Oct 14 '20 at 17:01
  • Exactly I am only setting sheet1 column C and E to the new formatting. – Lukas S Oct 14 '20 at 17:02
  • How do I set all the sheets I would create dynamically for the excel there. I cant use the code, as it shows `Sheet 1 ` error. How do I set for all shetts (or full workbook) and all columns (in all the sheets) – Scope Oct 14 '20 at 17:08