0

I am working on a project linking psychological battery exams to the chances patients will abuse prescription drugs. My original dataset looked like this:

ID Age Sex Neuro Aggro Agree Impulse Cocaine Crack ... Legal MJ
1   25  M   9      4     1      5      CL1    CL2  ...  CL1  CL3
2   28  F   4      5     5      8      CL0    CL1  ...  CL3  CL3

I figured it would be nice to just get rid of the CL's and just have the numbers, so I ran

df=df.replace('CL0', 0, regex= True)

So my dataset looked more like

ID Age Sex Neuro Aggro Agree Impulse Cocaine Crack ... Legal MJ
1   25  M   9      4     1      5      1       2  ...    1    3
2   28  F   4      5     5      8      0       1  ...    3    3

However, when I run df.describe(), it would only show the columns I didn't change. I checked for strings in my altered columns, but there weren't any. The values are all integers for each edited column. I then tried df.describe(include = 'all') as per Pandas df.describe doesn't work after adding new column, and the values for edited columns are there for count, unique, top, and freq, but all of the mathematical descriptors are null, such as average, Std Dev, etc.

What am I missing? How can I replace the values in the above columns with integers that the df.describe() will be able to perform the necessary math on?

Thanks in advance.

Ven
  • 37
  • 1
  • 9
  • So I just realized that when I ran df.info(), these columns are still represented as objects. So I guess I need to make the columns run as integers. – Ven Feb 23 '23 at 02:26

2 Answers2

1

Found the answer at Pandas: convert dtype 'object' to int

Nii Nii Joshua's post helped the most:

df['col_name'] = pd.to_numeric(df['col_name'])

This is a better option

Ven
  • 37
  • 1
  • 9
0
file.csv
ID Age Sex Neuro Aggro Agree Impulse Cocaine Crack ... Legal MJ
1   25  M   9      4     1      5      CL1    CL2  ...  CL1  CL3
2   28  F   4      5     5      8      CL0    CL1  ...  CL3  CL3


import pandas as pd
import numpy as np

df=pd.read_csv("/content/file.csv")

df.head()

df.columns

coc={}
for i,v in enumerate(df['Cocaine'].unique(),1):
  coc[v]=i

cra={}
for i,v in enumerate(df['Crack'].unique(),1):
  cra[v]=i

print(cra)
print(coc)

df['Cocaine'].replace(coc,inplace=True)
df['Crack'].replace(cra,inplace=True)

df.head()

df.dtypes

df.describe()

output:

ID          int64
Age         int64
Sex        object
Neuro       int64
Aggro       int64
Agree       int64
Impulse     int64
Cocaine     int64
Crack       int64
dtype: object
rajkamal
  • 73
  • 4