2

I have a data frame that I need to convert specifically to two decimal place resolution based on the following logic:

  • if x (in terms of the value with more than two decimals places) > math.floor(x) + 0.5

    • ...then round this value to two decimals.
  • if x (in terms of the value with more than two decimals places) < math.ceil(x) - 0.5

    • ...then truncate this value to two decimals.

The main hang-up I am having is just actually seeing these newly rounded/truncated values replace the originals in the data frame.

Sample dataframe:

import math
import pandas as pd  

test_df = pd.DataFrame({'weights': ['25.2524%', '25.7578%', '35.5012%', '13.5000%', 
    "50.8782%", "10.2830%", "5.5050%", "30.5555%", "20.7550%"]})

# .. which creates:

   | weights |
|0 | 25.2524%|
|1 | 25.7578%|
|2 | 35.5012%|
|3 | 13.5000%|
|4 | 50.8782%|
|5 | 10.2830%|
|6 |  5.5050%|
|7 | 30.5555%|
|8 | 20.7550%|

Define truncate function, and also the function that will configure decimal resolution:

def truncate_decimals(target_allocation, two_decimal_places) -> float:
    decimal_exponent = 10.0 ** two_decimal_places
    return math.trunc(decimal_exponent * target_allocation) / decimal_exponent

def decimals(df):
    df["weights"] = df["weights"].str.rstrip("%").astype("float")
    decimal_precision = 2
    for x in df["weights"]:
        if x > math.floor(x) + 0.5:
            x = round(x, decimal_precision)
            print("This value is being rounded", x)
            df.loc[(df.weights == x), ('weights')] = x
        elif x < math.ceil(x) - 0.5:
            y = truncate_decimals(x, decimal_precision)
            print("This value is being truncated", y)
            df.loc[(df.weights == x), ('weights')] = y
        else:
            pass
            print("This value does not meet one of the above conditions", round(x, decimal_precision))

    return df


decimals(test_df)

Expected output:

This value is being truncated 25.25
This value is being rounded 25.76
This value is being rounded 35.5
This value does not meet one of the above conditions 13.5
This value is being rounded 50.88
This value is being truncated 10.28
This value is being rounded 5.5
This value is being rounded 30.56
This value is being rounded 20.75

   | weights|
|0 | 25.25  |
|1 | 25.76  |
|2 | 35.5   |
|3 | 13.5   |
|4 | 50.88  |
|5 | 10.28  |
|6 |  5.5   |
|7 | 30.56  |
|8 | 20.75  |

Current output:

The current value is being truncated 25.25

   | weights |
|0 | 25.2524%|
|1 | 25.7578%|
|2 | 35.5012%|
|3 | 13.5000%|
|4 | 50.8782%|
|5 | 10.2830%|
|6 |  5.5050%|
|7 | 30.5555%|
|8 | 20.7550%|
smci
  • 32,567
  • 20
  • 113
  • 146
Jake Walther
  • 79
  • 1
  • 1
  • 7
  • I think you're simply asking how to assign the output from a pandas command back to the source dataframe? Specifically why you attempting to overwrite the argument `df` inside the function `decimals` doesn't work. If so your title is way misleading and this is a duplicate. – smci Feb 12 '22 at 23:07
  • yes, thank you for clarifying. Essentially take the raw data from the sample df (test_df), then apply the logic in the decimal function to either round or truncate the values that fit that logic to replace the original values in the data frame. – Jake Walther Feb 12 '22 at 23:13
  • 1
    Can you please fix your broken indentation scheme in the `for..if` clauses inside `decimals`? You can't use 4-3-2-3 spaces indentation! (This ain't a soccer formation...). Either use 4 spaces everywhere (PEP-8 compliant) or else 2 spaces (more compact, for readability of long lines on SO). – smci Feb 12 '22 at 23:33
  • 1
    First, fix the reading of percents as floats at `read_csv()` time, don't shunt the code for that conversion inside your function. [Convert percent string to float in pandas `read_csv`](https://stackoverflow.com/questions/25669588/convert-percent-string-to-float-in-pandas-read-csv) – smci Feb 13 '22 at 00:11
  • 1
    Next, you essentially have a function iterating over a series (single column), that wants to assign/return a series. So doing `for x in df["weights"]:` and then backtracking with `df.loc[(df.weights == x), ('weights')]` is a tortured way to reference each cell. Either directly use vectorized operations `df["weights"].some_operation()`, or define a lambda function and `apply()` it, or if you really must, use `iterrows()` (non-vectorized). – smci Feb 13 '22 at 00:26
  • 1
    Near-duplicate: see [Python get decimal fractional number part from float in a dataframe](https://stackoverflow.com/questions/51638367/python-get-decimal-fractional-number-part-from-float-in-a-dataframe), and in particular [`numpy.modf()`](https://stackoverflow.com/questions/51638367/python-get-decimal-fractional-number-part-from-float-in-a-dataframe). Or [How can you use math.modf in a dataframe?](https://stackoverflow.com/questions/65305909/how-can-you-use-math-modf-in-a-dataframe) – smci Feb 13 '22 at 00:52

2 Answers2

6

The pandas .round() function already does all this in one line. Don't reinvent the wheel.

>>> tdf['weights'].round(2)

0    25.25
1    25.76
2    35.50
3    13.50
4    50.88
5    10.28
6     5.50
7    30.56
8    20.76
  • If you want to eliminate the trailing '0' in e.g. '13.50', that's just string formatting, see .format()

You don't even need to use the function modf which gets both the fractional and integer part of a float.

  • (It's in both numpy.modf and math.modf; use the numpy version because it's vectorized so you can call it on once the entire series and won't do lots of individual, slow C calls like math.modf, math.ceil, math.floor would)

So for example, if you wanted to get a series of tuple of (float, integer) parts:

import numpy as np
pd.Series(zip(*np.modf(tdf['weights'])))

0    (0.2524000000000015, 25.0)
1    (0.7577999999999996, 25.0)
2    (0.5011999999999972, 35.0)
3                   (0.5, 13.0)
4    (0.8781999999999996, 50.0)
5    (0.2829999999999995, 10.0)
6     (0.5049999999999999, 5.0)
7    (0.5554999999999986, 30.0)
8     (0.754999999999999, 20.0)

Note: first you must convert the percent string to float:

tdf["weights"] = tdf["weights"].str.rstrip("%").astype("float")
smci
  • 32,567
  • 20
  • 113
  • 146
1

Another approach could be to define a function that applies the above rule for a generic number and then apply it to each weight in the column.

Something like this

import math
import pandas as pd  

test_df = pd.DataFrame({'weights': ['25.2524%', '25.7578%', '35.5012%', '13.5000%', 
    "50.8782%", "10.2830%", "5.5050%", "30.5555%", "20.7550%"]})

def truncate_decimals(target_allocation, two_decimal_places) -> float:
    decimal_exponent = 10.0 ** two_decimal_places
    return math.trunc(decimal_exponent * target_allocation) / decimal_exponent

def rule(number, decimal_precision=2):
    number = float(number.rstrip("%"))
    
    if number > math.floor(number) + 0.5:
        number = round(number, decimal_precision)
        print("This value is being rounded", number)
    
    elif number < math.ceil(number) - 0.5:
        number = truncate_decimals(number, decimal_precision)
        print("This value is being truncated", number)
       
    else:
        print("This value does not meet one of the above conditions", round(number, decimal_precision))
    
    return number

test_df['rounded'] = test_df.weights.apply(rule)
John Giorgio
  • 634
  • 3
  • 10
  • This is reinventing the wheel, **[`np.modf`](https://stackoverflow.com/questions/51638367/python-get-decimal-fractional-number-part-from-float-in-a-dataframe) and `math.modf` already exist to get both the fractional and integer part of a float**. – smci Feb 16 '22 at 22:44
  • Sure, but the question was about how to replace the values in the dataframe, not how to get fractional and integer parts. – John Giorgio Feb 16 '22 at 22:49
  • 1
    ...and `np/modf` can be used as building-block to get that. One vectorized call. Not lots of individual calls (one per number!) to `math.floor`, `math.ceil`... This answer really seriously is reinventing the wheel. – smci Feb 16 '22 at 22:51
  • It's actually even worse than that. All the OP's code is reinventing the **pandas [`.round()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.round.html) function** which already does all this in one line. (Don't even need `modf`) – smci Feb 16 '22 at 23:06