2

I have the following DataFrame:

      High  counts   Total
11  1.2492       2  2.4984
1   1.2466       2  2.4932
20  1.2574       1  1.2574
19  1.2547       1  1.2547
18  1.2523       1  1.2523

I have used following line to groupby

hg = grps.groupby("High").size().reset_index(name='counts')

I am trying to group by this DataFrame in the following format. But values of High must not change.

{:.3f} # 1.249, 1.246, 1.257 ...

Is it possible to do this or shall i create a new DataFrame?

EDIT:

Here is the expected output

      High  counts   Total  New
11  1.2492       2  2.4984  1.249
1   1.2466       2  2.4932  1.246
20  1.2574       1  1.2574  1.257
19  1.2547       1  1.2547  1.254
18  1.2523       1  1.2523  1.252
jpp
  • 159,742
  • 34
  • 281
  • 339
Don Coder
  • 526
  • 5
  • 24
  • You say you want to group by that (yes it is possible) but what do you want out of it? If you aggregate, you end up with fewer rows than you started with. In that case, how shall you choose among the multiple possible rows? You may want to transform? You haven't told us what you want in the end. And that is a critical part of this problem. – piRSquared Apr 07 '18 at 00:40
  • Actually i did. The following line is the output: {:.3f} # 1.249, 1.246, 1.257 ... – Don Coder Apr 07 '18 at 08:37
  • 1
    @DonCoder - Do you need `df['New'] = df['High'].map('{:.3f}'.format)` ? Or `df['New'] = df['High'].round(3)` ? – jezrael Apr 07 '18 at 08:54

2 Answers2

2

I think need numpy.floor with values to convert values to numpy array multiple by number of values after float point and last divide same value:

N = 3
df['New'] = np.floor(df['High'].values * 10 ** N) / 10 ** N
print (df)
      High  counts   Total    New
11  1.2492       2  2.4984  1.249
1   1.2466       2  2.4932  1.246
20  1.2574       1  1.2574  1.257
19  1.2547       1  1.2547  1.254
18  1.2523       1  1.2523  1.252

Another solution:

df['New'] = (df['High'].values * 10**N).astype(int) / 10**N

Fast comparison:

#[5000 rows x 4 columns]
df = pd.concat([df] * 1000, ignore_index=True)

In [213]: %timeit df['New1'] = df['High'].map(truncate)
8.27 ms ± 667 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [214]: %timeit df['New1'] = np.floor(df['High'].values * 10 ** N) / 10 ** N
220 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can use a custom formula utilising math.floor and pd.Series.map.

import math

def truncate(f, n=3):
    return math.floor(f * 10 ** n) / 10 ** n

df['New'] = df['High'].map(truncate)

#       High  counts   Total    New
# 11  1.2492       2  2.4984  1.249
# 1   1.2466       2  2.4932  1.246
# 20  1.2574       1  1.2574  1.257
# 19  1.2547       1  1.2547  1.254
# 18  1.2523       1  1.2523  1.252

Note you may still face issues with floating point arithmetic since you are beginning with float data.

jpp
  • 159,742
  • 34
  • 281
  • 339