0

I'm presenting a data frame in Jupyter Notebook. The initial data type of the data frame is float. I want to present rows 1 & 3 of the printed table as integers and rows 2 & 4 as percentage. How do I do that? (I've spent numerous hours looking for a solution with no success)

Here's the code I'm using:

#Creating the table
clms = sales.columns
indx = ['# of Poeple','% of Poeple','# Purchased per Activity','% Purchased per Activity']
basic_stats = pd.DataFrame(data=np.nan,index=indx,columns=clms)
basic_stats.head()

#Calculating the # of people who took part in each activity
for clm in sales.columns:
    basic_stats.iloc[0][clm] = int(round(sales[sales[clm]>0][clm].count(),0))

#Calculating the % of people who took part in each activity from the total email list
for clm in sales.columns:
    basic_stats.iloc[1][clm] = round((basic_stats.iloc[0][clm] / sales['Sales'].count())*100,2)

#Calculating the # of people who took part in each activity AND that bought the product
for clm in sales.columns:
    basic_stats.iloc[2][clm] = int(round(sales[(sales[clm] >0) & (sales['Sales']>0)][clm].count()))

#Calculating the % of people who took part in each activity AND that bought the product
for clm in sales.columns:
    basic_stats.iloc[3][clm] = round((basic_stats.iloc[2][clm] / basic_stats.iloc[0][clm])*100,2)

#Present the table
basic_stats

Here's the printed table: Output table of 'basic_stats' data frame in Jupyter Notebook

Shahar
  • 791
  • 1
  • 6
  • 5
  • 1
    Possible duplicate of [How to display pandas DataFrame of floats using a format string for columns?](http://stackoverflow.com/questions/20937538/how-to-display-pandas-dataframe-of-floats-using-a-format-string-for-columns) – IanS Apr 03 '17 at 13:00
  • I would recommend transposing your table, applying the solution to the suggested duplicate, then transposing back to display it. – IanS Apr 03 '17 at 13:01

2 Answers2

0

Integer representation

You already assign integers to the cells of row 1 and 3 are. The reason why these integers are printed as floats is that all columns have the data type float64. This is caused by the way you initially create the Dataframe. You can view the data types by printing the .dtypes attribute:

basic_stats = pd.DataFrame(data=np.nan,index=indx,columns=clms)
print(basic_stats.dtypes)

# Prints:
# column1    float64
# column2    float64
# ...
# dtype: object

If you don't provide the data keyword argument in the constructor of the Data frame, the data type of each cell will be object which can be any object:

basic_stats = pd.DataFrame(index=indx,columns=clms)
print(basic_stats.dtypes)

# Prints:
# column1    object
# column2    object
# ...
# dtype: object

When the data type of a cell is object, the content is formatted using it's builtin methods which leads to integers bein formatted properly.

Percentage representation

In order to display percentages, you can use a custom class that prints a float number the way you want:

class PercentRepr(object):
    """Represents a floating point number as percent"""
    def __init__(self, float_value):
        self.value = float_value
    def __str__(self):
        return "{:.2f}%".format(self.value*100)

Then just use this class for the values of row 1 and 3:

#Creating the table
clms = sales.columns
indx = ['# of Poeple','% of Poeple','# Purchased per Activity','% Purchased per Activity']
basic_stats = pd.DataFrame(index=indx,columns=clms)
basic_stats.head()

#Calculating the # of people who took part in each activity
for clm in sales.columns:
    basic_stats.iloc[0][clm] = int(round(sales[sales[clm]>0][clm].count(),0))

#Calculating the % of people who took part in each activity from the total email list
for clm in sales.columns:
    basic_stats.iloc[1][clm] = PercentRepr(basic_stats.iloc[0][clm] / sales['Sales'].count())

#Calculating the # of people who took part in each activity AND that bought the product
for clm in sales.columns:
    basic_stats.iloc[2][clm] = int(round(sales[(sales[clm] >0) & (sales['Sales']>0)][clm].count()))

#Calculating the % of people who took part in each activity AND that bought the product
for clm in sales.columns:
    basic_stats.iloc[3][clm] = PercentRepr(basic_stats.iloc[2][clm] / basic_stats.iloc[0][clm])

#Present the table
basic_stats

Note: This actually changes the data in your dataframe! If you want to do further processing with the data of rows 1 and 3, you should be aware that these rows don't contain float objects anymore.

Felix
  • 6,131
  • 4
  • 24
  • 44
  • Thank you so much for this very helpful solution and the important comment below. I learned something new :) – Shahar Apr 03 '17 at 18:26
0

Here's one way, kind of a hack, but if its simply for pretty printing, it'll work.

df = pd.DataFrame(np.random.random(20).reshape(4,5))

# first and third rows display as integers
df.loc[0,] = df.loc[0,]*100 
df.loc[2,] = df.loc[2,]*100

df.loc[0,:] = df.loc[0,:].astype(int).astype(str)
df.loc[2,:] = df.loc[2,:].astype(int).astype(str)

# second and fourth rows display as percents (with 2 decimals)
df.loc[1,:] = np.round(df.loc[1,:].values.astype(float),4).astype(float)*100
df.loc[3,:] = np.round(df.loc[3,:].values.astype(float),4).astype(float)*100
ilanman
  • 818
  • 7
  • 20