0

I have a very large script, to large to post here, so I hope that someone can help me without posting the whole script... In my script, I have multiple columns with numbers (columns 'AAAAA':'TTTTT'), and I divide those numbers with the number in one column ('kmer_number'). The output is written to new columns. This is done by this command

df5d[['Column{}'.format(i) for i in range(2003, 2003+(2002-979)+1)]] = df5d.loc[:, 'AAAAA':'TTTTT'].div(df5d['kmer_number'], axis=0)

The output in the new columns are numbers with > 8 decimals and I want to convert those to scientific notifications

My output is like this

Column2972  Column2973  Column2974
0.000755306 0.00025591  0.000305601
0.000783782 0.000265844 0.000433143
0           0           0
0.000817596 0.000281049 0.000309438
0.000819018 0.000262932 0.000386843

I tried the following command

df5d[2003:3026] = df5d[2003:3026].map('{:.2e}'.format)

but this gave the error

"Traceback (most recent call last):
  File "pythonscript_v10.py", line 226, in <module>
    df5d[2003:3026] = df5d[2003:3026].map('{:.2e}'.format)
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 2360, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'map'
Gravel
  • 365
  • 1
  • 5
  • 19

1 Answers1

1

Use pd.set_option

>>d = pd.DataFrame(np.random.random((5,3)))

>>d

          0         1         2
0  0.725952  0.048684  0.735873
1  0.188897  0.043040  0.250257
2  0.623823  0.887885  0.269239
3  0.764847  0.069001  0.155357
4  0.515004  0.858192  0.726932

>>pd.set_option('display.float_format', '{:.2E}'.format)

>>d

         0        1        2
0 7.26E-01 4.87E-02 7.36E-01
1 1.89E-01 4.30E-02 2.50E-01
2 6.24E-01 8.88E-01 2.69E-01
3 7.65E-01 6.90E-02 1.55E-01
4 5.15E-01 8.58E-01 7.27E-01

Ref: This and This

EDIT:

As pointed out in the comments, if you only want to have a particular column in the scientific notation (say column 0):

>>d = pd.DataFrame(np.random.random((5,3)))

          0         1         2
0  0.113197  0.352638  0.023745
1  0.261915  0.742125  0.196289
2  0.413795  0.665053  0.927284
3  0.380613  0.660596  0.141781
4  0.826938  0.672995  0.464685


>>d[0] = d[0].map('{:,.2E}'.format)

>>d
    0         1         2
0  1.13E-01  0.352638  0.023745
1  2.62E-01  0.742125  0.196289
2  4.14E-01  0.665053  0.927284
3  3.81E-01  0.660596  0.141781
4  8.27E-01  0.672995  0.464685

EDIT 2:

For a dataframe (and also, for a part of a dataframe), use applymap

>>d = pd.DataFrame(np.random.random((5,3)))

>>d
          0         1         2
0  0.526628  0.061561  0.536804
1  0.784187  0.372477  0.444849
2  0.438519  0.515741  0.858563
3  0.015711  0.728206  0.484090
4  0.855883  0.611769  0.460805


>>d = d.applymap('{:,.2E}'.format)

>>d
          0         1         2
0  5.27E-01  6.16E-02  5.37E-01
1  7.84E-01  3.72E-01  4.45E-01
2  4.39E-01  5.16E-01  8.59E-01
3  1.57E-02  7.28E-01  4.84E-01
4  8.56E-01  6.12E-01  4.61E-01
akilat90
  • 5,436
  • 7
  • 28
  • 42
  • Thanks. How can I apply this only to the columns of the divisions and not the whole dataframe? – Gravel Jun 26 '17 at 13:17
  • I tried the solution, but I get the error `AttributeError: 'DataFrame' object has no attribute 'map'` – Gravel Jun 26 '17 at 18:48
  • I used the command `df5d[2003:3026] = df5d[2003:3026].applymap('{:.2e}'.format)`, but still got numbers with decimals instead of scientific notation. – Gravel Jun 26 '17 at 19:16
  • I think it is a problem with column slicing rather than `applymap`. Assuming 2003,... 3026 are positional indexes, try `df5d.loc[:, 2003:3026] = df5d.loc[:, 2003:3026].applymap('{:.2e}'.format)` . Read more [here](https://stackoverflow.com/questions/28757389/loc-vs-iloc-vs-ix-vs-at-vs-iat) – akilat90 Jun 26 '17 at 19:25
  • `df5d[2003:3026]` selects the rows of the specified range - not columns. – akilat90 Jun 26 '17 at 19:32
  • Oh, I thought to be smart and select the columns with my command ;-(, how can I select column number 2003 - 3026? the command with loc gets the error `TypeError: cannot do slice indexing on with these indexers [2003] of ` – Gravel Jun 26 '17 at 19:37
  • Yes, I changed the command with loc to the columnnames and it worked! Thanks a lot! – Gravel Jun 26 '17 at 19:43
  • That is a different problem about slicing dataframes. I suggest reading the provided link in one of the above comments or asking a separate question. – akilat90 Jun 26 '17 at 19:47