0

I'm new to Python programming. Currently I try to make a graph that able to show percentage on top of bar chart in 2 decimal places.

df_survey is dataframe that i made from using pandas library. (I try to copy datafame df_survey_sort into df_survey_pct but when i make change in df_survey_pct, df_survey_sort also change... can someone explain to me why this happen. As result, I do as following to make df_survey_sort and df_survey_pct not overwite on each other)

df_survey = df_survey[['Very interested','Somewhat interested','Not interested']]
df_survey_sort = df_survey.sort_values(by='Very interested', ascending=0)
#df_survey_pct = df_survey_sort
df_survey_pct = df_survey.sort_values(by='Very interested', ascending=0)
total_ds = df_survey_sort.sum(axis=1)

for i in range(0,df_survey_sort.shape[0]):
    df_survey_pct.iloc[i][0] = round(df_survey_sort.iloc[i][0]/total_ds[i]*100,2)
    df_survey_pct.iloc[i][1] = round(df_survey_sort.iloc[i][1]/total_ds[i]*100,2)
    df_survey_pct.iloc[i][2] = round(df_survey_sort.iloc[i][2]/total_ds[i]*100,2)

this is the datatype of df_survey_pct

Very interested        int64
Somewhat interested    int64
Not interested         int64
dtype: object

when I execute print(df_survey_pct), the value of each cell is not in decimal places.

I even try df_survey_pct = df_survey_pct.round(2) and df_survey_pct = df_survey_pct.astype('float') however the value is still in integer.

Because of this, I can only show integer percentage in my bar chart.

Fahmieyz
  • 255
  • 4
  • 19
  • can you share the dtypes of the columns ['Very interested','Somewhat interested','Not interested'] Also do you just have 3 rows that you loop over sequentially in a for loop? That is not very performant. – devssh Sep 25 '18 at 12:54
  • 1
    Do share some sample input and desired output? – Sai Kumar Sep 25 '18 at 12:55
  • 3
    Possible duplicate: https://stackoverflow.com/questions/20937538/how-to-display-pandas-dataframe-of-floats-using-a-format-string-for-columns – Andy Sep 25 '18 at 12:59
  • 1
    @devssh how is my loop not performant? – Fahmieyz Sep 25 '18 at 13:54
  • 1
    Because the for loop is not parallelized, dataframe["column"] gives a pd.Series and pd.Series.apply is in parallel as it is vectorized. – devssh Sep 25 '18 at 14:44
  • Is there a reason you're not doing `df_survey_pct[j] = df_survey_sort[j].apply(lambda x: round(x/total_ds[j]*100,2)` in a `for j in range(3)`. – devssh Sep 25 '18 at 14:51
  • Possible duplicate of [Round each number in a Python pandas data frame by 2 decimals](https://stackoverflow.com/questions/25272024/round-each-number-in-a-python-pandas-data-frame-by-2-decimals) – Traxes Sep 25 '18 at 17:04
  • @devssh i try as you suggest... but still getting integer. maybe it something to do with how i put the data into pandas. thanks for the help – Fahmieyz Sep 26 '18 at 01:22
  • you can use df["Very Interested"] = df["Very Interested"].astype(np.float64) to convert the dtypes. Make sure it is loaded as a float before you round off. – devssh Sep 26 '18 at 05:57

2 Answers2

1

Here is how you convert np.float64 columns to 2 decimal places

df_survey["some_column_with_too_many_decimal"] = df_survey["some_column_with_too_many_decimal"].apply(lambda x: int(x*100)/100)

Also to select only certain rows in that column if that is what you need, please use df.loc instead of iloc on every row, since the df might have too many rows.

df.loc[(df["column1"]>0), ["column2", "column3"]]

or

df.loc[(df["column1"]>0), "column2", "column3"]

The first argument to loc is a list of conditions to filter by, the second argument is the columns to select and then you can update them by using apply as show above.

If you want to use round, you can round off values then multiply by 100, convert to int and divide by 100 making it decimal with 2 places. The round function does not limit it to 2 decimal places because of the way the values are stored in the dataframe.

devssh
  • 1,184
  • 12
  • 28
1

You can round off the DataFrame directly using df.round(2)

Traxes
  • 31
  • 3