0

I have a dataframe and I'm trying to create a new column of values that is one column divided by the other. This should be obvious but I'm only getting 0's and 1's as my output.

I also tried converting the output to float in case the output was somehow being rounded off but that didn't change anything.

def answer_seven():

    df = answer_one()

    columns_to_keep = ['Self-citations', 'Citations']

    df = df[columns_to_keep]

    df['ratio'] = df['Self-citations'] / df['Citations']

    return df

answer_seven()

Output:

    Self_cite   Citations   ratio
Country         
Aus.    15606   90765   0
Brazil  14396   60702   0
Canada  40930   215003  0
China   411683  597237  1
France  28601   130632  0
Germany 27426   140566  0
India   37209   128763  0
Iran    19125   57470   0
Italy   26661   111850  0
Japan   61554   223024  0
S Korea 22595   114675  0
Russian 12422   34266   0
Spain   23964   123336  0
Britain 37874   206091  0
America 265436  792274  0

Does anyone know why I'm only getting 1's and 0's when I want float values? I tried the solutions given in the link suggested and none of them worked. I've tried to convert the values to floats using a few different methods including .astype('float'), float(df['A']) and df['ratio'] = df['Self-citations'] * 1.0 / df['Citations']. But none have worked so far.

am4279
  • 11
  • 5
  • This is really weird. What version of python/pandas are you using? – rafaelc Apr 30 '19 at 21:44
  • @RafaelC Python 3. – am4279 Apr 30 '19 at 21:45
  • what are `df.dtypes`? – jlandercy Apr 30 '19 at 21:46
  • @jlandercy I'm not sure. I tried a few different things to find out what the dtype was. Here are the output error messages I got: df['ratio'][2].dtypes() => "'numpy.float64' object has no attribute 'dtypes'. df['ratio'].dtypes() => " 'numpy.dtype' object is not callable". df.dtypes() => " 'Series' object is not callable" – am4279 Apr 30 '19 at 21:55
  • just type `df.dtypes` – jlandercy Apr 30 '19 at 21:57
  • 1
    Possible duplicate of [Typecasting before division (or any other mathematical operator) of columns in dataframes](https://stackoverflow.com/questions/12183432/typecasting-before-division-or-any-other-mathematical-operator-of-columns-in-d) – philshem Apr 30 '19 at 21:59
  • @jlandercy "Self-citations int64 Citations int64 ratio float64 dtype: object" – am4279 Apr 30 '19 at 22:21
  • @philshem I tried each of the solutions suggested in that thread but none of them worked. – am4279 Apr 30 '19 at 22:41
  • what is your `pd.__version__` ? – jlandercy Apr 30 '19 at 22:46
  • please post a working input data frame, it sort of works with `df = pd.read_clipboard(sep=r'\s\s+')` for me, but in general, see this: https://stackoverflow.com/a/20159305/2327328 – philshem Apr 30 '19 at 22:47
  • @jlandercy pd.__version__ = '0.19.2' – am4279 Apr 30 '19 at 22:52
  • That's an old version. I would suggest to upgrade and check if the problem persists. – jlandercy May 01 '19 at 09:33

1 Answers1

0

Without having the exact dataframe it is difficult to say. But it is most likely a casting problem.

Lets build a MCVE:

import io
import pandas as pd

s = io.StringIO("""Country;Self_cite;Citations
Aus.;15606;90765
Brazil;14396;60702
Canada;40930;215003
China;411683;597237
France;28601;130632
Germany;27426;140566
India;37209;128763
Iran;19125;57470
Italy;26661;111850
Japan;61554;223024
S. Korea;22595;114675
Russian;12422;34266
Spain;23964;123336
Britain;37874;206091
America;265436;792274""")
df = pd.read_csv(s, sep=';', header=0).set_index('Country')

Then we can perform the desired operation as you suggested:

df['ratio'] = df['Self_cite']/df['Citations']

Checking dtypes:

df.dtypes

Self_cite      int64
Citations      int64
ratio        float64
dtype: object

The result is:

          Self_cite  Citations     ratio
Country                                 
Aus.          15606      90765  0.171939
Brazil        14396      60702  0.237159
Canada        40930     215003  0.190369
China        411683     597237  0.689313
France        28601     130632  0.218943
Germany       27426     140566  0.195111
India         37209     128763  0.288973
Iran          19125      57470  0.332782
Italy         26661     111850  0.238364
Japan         61554     223024  0.275997
S. Korea      22595     114675  0.197035
Russian       12422      34266  0.362517
Spain         23964     123336  0.194299
Britain       37874     206091  0.183773
America      265436     792274  0.335031

Graphically:

df['ratio'].plot(kind='bar')

enter image description here

If you want to enforce type, you can cast dataframe using astype method:

df.astype(float)
jlandercy
  • 7,183
  • 1
  • 39
  • 57