0

Goal

  • Only one line to execute.
  • I refer round function from this post. But I want using like df.round(2) which changes the affected columns but keep the sequence of data but not required selecting float or int type.
  • df.applymap(myfunction) will get TypeError: must be real number, not str, which means I have to select type first.

Try

  • I refer round source code but I could not and understand how to change my function.
Jack
  • 1,724
  • 4
  • 18
  • 33

2 Answers2

1

Firstly get the columns where values are float:

cols=df.select_dtypes('float').columns

Finally:

df[cols]=df[cols].agg(round,ndigits=2)

If you want to make changes in the function then add if/else condition:

from numpy import ceil, floor


def float_round(num, places=2, direction=ceil):
    if isinstance(num,float):
        return direction(num * (10 ** places)) / float(10 ** places)
    else:
        return num

out=df.applymap(float_round)
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
0

With the error message you mention, it's likely the column is already a string, and needs to be converted to some numeric type.

Let's now assume that the column is numeric, there are a few ways you could implement custom rounding functions that don't require reimplementing the .round() method of a dataframe object.

With the requirements you laid above, we want a way to round a data frame that:

  • fits on one line
  • doesn't require selecting numeric type

There are two ways we could do this that are functionally equivalent. One is to treat the dataframe as an argument to a function that is safe for numpy arrays.

Another is to use the apply method (explanation here) which applies a function to a row or a column.

import pandas as pd
import numpy as np

from numpy import ceil

# generate a 100x10 dataframe with a null value
data = np.random.random(1000) * 10
data = data.reshape(100,10)
data[0, 0] = np.nan
df = pd.DataFrame(data)

# changing data type of the second column
df[1] = df[1].astype(int)

# verify dtypes are different
print(df.dtypes)

# taken from other stack post
def float_round(num, places=2, direction=ceil):
    return direction(num * (10 ** places)) / float(10 ** places)

# method 1 - use the dataframe as an argument
result1 = float_round(df)
print(result1.head())

# method 2 - apply 
result2 = df.apply(float_round)
print(result2)

Because apply is applied row or column-wise, you can specify logic in your round function to ignore non-numeric columns. For instance:

# taken from other stack post
def float_round(num, places=2, direction=ceil):
    # check type of a specific column
    if num.dtype == 'O':
        return num
    return direction(num * (10 ** places)) / float(10 ** places)

# this will work, method 1 will fail
result2 = df.apply(float_round)
print(result2) 
gpicard
  • 151
  • 1
  • 8
  • If type contains not only string but datetime etc. It will failed. I think Anurag Dabas's answer can avoid the problem. – Jack Jun 20 '21 at 10:06