273

How can one modify the format for the output from a groupby operation in pandas that produces scientific notation for very large numbers?

I know how to do string formatting in python but I'm at a loss when it comes to applying it here.

df1.groupby('dept')['data1'].sum()

dept
value1       1.192433e+08
value2       1.293066e+08
value3       1.077142e+08

This suppresses the scientific notation if I convert to string but now I'm just wondering how to string format and add decimals.

sum_sales_dept.astype(str)
martineau
  • 119,623
  • 25
  • 170
  • 301
horatio1701d
  • 8,809
  • 14
  • 48
  • 77
  • 3
    possible duplicate of [Suppressing scientific notation in pandas?](http://stackoverflow.com/questions/17737300/suppressing-scientific-notation-in-pandas) – Dan Allan Jan 15 '14 at 13:13
  • 3
    I saw that question but I'm not sure how that helps me. I'm just looking to preserve the current dtype which is float and simply show all decimals in the result instead of scientific notation. – horatio1701d Jan 15 '14 at 13:52
  • That is probably just a display thing. But if you think there's something particular about your problem makes yours different from the one in Dan's link then you need to post more information about your problem, preferably with a small dataset that reproduces the problem. Also what are the `dtypes` on your result? – TomAugspurger Jan 15 '14 at 14:18
  • Although technically *not* a duplicate of this question, [another question's answer](https://stackoverflow.com/a/52405666/1048186) completes both the goal of this question **and adds commas between the thousands**, all with fewer characters! – Josiah Yoder Jul 20 '23 at 17:07
  • While currently [Suppressing scientific notation in pandas?](https://stackoverflow.com/questions/17737300/suppressing-scientific-notation-in-pandas), is marked as a duplicate of this question, its top answer is better than this question because it **also adds the commas between the thousands**. Therefore, I support [the suggestion above that the duplicate mark be reversed](https://stackoverflow.com/questions/21137150/format-suppress-scientific-notation-from-pandas-aggregation-results#comment31811285_21137150). – Josiah Yoder Jul 20 '23 at 17:09

8 Answers8

384

Granted, the answer I linked in the comments is not very helpful. You can specify your own string converter like so.

In [25]: pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [28]: Series(np.random.randn(3))*1000000000
Out[28]: 
0    -757322420.605
1   -1436160588.997
2   -1235116117.064
dtype: float64

I'm not sure if that's the preferred way to do this, but it works.

Converting numbers to strings purely for aesthetic purposes seems like a bad idea, but if you have a good reason, this is one way:

In [6]: Series(np.random.randn(3)).apply(lambda x: '%.3f' % x)
Out[6]: 
0     0.026
1    -0.482
2    -0.694
dtype: object
Dan Allan
  • 34,073
  • 6
  • 70
  • 63
  • 1
    Thanks Dan. Do you know how to reset pandas options? – Josh Jan 23 '17 at 19:04
  • 4
    @Josh To temporarily set options in pandas, you can use `pandas.option_context` (see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.option_context.html). – muellermarkus Jul 27 '18 at 10:41
  • 2
    It's oftentimes not for aesthetic purposes, but for quicker skimming of information via the visual cortex over large numeric dataframes. – matanster Apr 20 '19 at 18:57
  • pd.set_option('display.float_format', lambda x: '%.3f' % x) worked for me too – driven_spider May 28 '19 at 17:19
  • 13
    This works and you can also use the newer f-string notation. Like `pd.set_option('display.float_format', lambda x: f'{x:,.3f}')` if you want a thousand separator as well. – 576i Oct 29 '19 at 10:26
  • or: `pd.set_option('float_format', '{:.2f}'.format)` – Muhammad Yasirroni Sep 15 '22 at 04:24
176

Here is another way of doing it, similar to Dan Allan's answer but without the lambda function:

>>> pd.options.display.float_format = '{:.2f}'.format
>>> Series(np.random.randn(3))
0    0.41
1    0.99
2    0.10

or

>>> pd.set_option('display.float_format', '{:.2f}'.format)
Steven C. Howell
  • 16,902
  • 15
  • 72
  • 97
tfhans
  • 1,761
  • 1
  • 7
  • 4
  • 3
    I think using a format string would be more approachable to team members that are less familiar with Python, and might not understand lambda functions. – Steven C. Howell Oct 12 '18 at 15:04
46

You can use round function just to suppress scientific notation for specific dataframe:

df1.round(4)

or you can suppress is globally by:

pd.options.display.float_format = '{:.4f}'.format
Vlad Bezden
  • 83,883
  • 25
  • 248
  • 179
30

If you want to style the output of a data frame in a jupyter notebook cell, you can set the display style on a per-dataframe basis:

df = pd.DataFrame({'A': np.random.randn(4)*1e7})
df.style.format("{:.1f}")

enter image description here

See the documentation here.

florestan
  • 4,405
  • 2
  • 14
  • 28
22

Setting a fixed number of decimal places globally is often a bad idea since it is unlikely that it will be an appropriate number of decimal places for all of your various data that you will display regardless of magnitude. Instead, try this which will give you scientific notation only for large and very small values (and adds a thousands separator unless you omit the ","):

pd.set_option('display.float_format', lambda x: '%,g' % x)

Or to almost completely suppress scientific notation without losing precision, try this:

pd.set_option('display.float_format', str)
dabru
  • 786
  • 8
  • 8
7

I had multiple dataframes with different floating point, so thx to Allans idea made dynamic length.

pd.set_option('display.float_format', lambda x: f'%.{len(str(x%1))-2}f' % x)

The minus of this is that if You have last 0 in float, it will cut it. So it will be not 0.000070, but 0.00007.

Full.Of.Life
  • 169
  • 2
  • 10
  • by far one of the most undervoted answers. – Michael Jan 24 '21 at 21:16
  • Could you explain a bit what are you trying to do? Because `(lambda x: f'%.{len(str(x%1))-2}f' % x)(0.000070)` returns `'0.000'` (not `'0.00007'`) and `(lambda x: f'%.{len(str(x%1))-2}f' % x)(1.000070)` returns `'1.000070000000000014'` . – Cristian Ciupitu Oct 10 '22 at 00:31
  • I don't know why You have results like that. It worked for me. My goal was to check the correctness of data in multiple dataframes. Most of it was some long float so it printed as scientific notation. I didn't apply it to every dataframe with some apply function. I used it as an argument in pd.set_options at the begining o file. What I did there I checked decimal places dynamically. I took the rest from substraction by whole number and changed it to string to count the lenght. Then substracted 2, because point/dot is on second position. – Full.Of.Life Oct 10 '22 at 22:17
7

Expanding on this useful comment, here is a solution setting the formatting options only to display the results without changing options permanently:

with pd.option_context('display.float_format', lambda x: f'{x:,.3f}'):
    display(sum_sales_dept)

dept
value1  119,243,300.0
value2  129,306,600.0
value3  107,714,200.0
gibbone
  • 2,300
  • 20
  • 20
0

If you would like to use the values, say as part of csvfile csv.writer, the numbers can be formatted before creating a list:

df['label'].apply(lambda x: '%.17f' % x).values.tolist()
xlm
  • 6,854
  • 14
  • 53
  • 55
evil242
  • 31
  • 2