0

I'm new to coding and this might be a silly question. I'm using the data preprocessing tools approach to practice missing data imputing on multiple files. However, I'm not clear when to use X.iloc[] vs x[]

Both the below examples work but I have no idea why

Ex 1:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
print (X)

Ex 2:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('datasets_596958_1073629_Placement_Data_Full_Class_edited3.csv')
X = dataset.iloc[:, :-1]
Y = dataset.iloc[:, -1]

from sklearn.impute import SimpleImputer
imputer1 = SimpleImputer(missing_values=np.NaN, strategy ="most_frequent")
imputer1.fit(X.iloc[:, 0:3])

X.iloc[:, 0:3]= imputer1.transform(X.iloc[:, 0:3])
imputer2 = SimpleImputer(missing_values=np.NaN, strategy ="mean")
imputer2.fit(X.iloc[:, 3:5])
X.iloc[:, 3:5]= imputer2.transform(X.iloc[:, 3:5])
  • This could be helpful, https://stackoverflow.com/questions/31593201/how-are-iloc-ix-and-loc-different#:~:text=iloc%20gets%20rows%20(or%20columns,not%20present%20in%20the%20index. – sushanth Jun 03 '20 at 06:28

1 Answers1

0

You would access a list using x[] and you would access a pandas dataframe using x.iloc[]. You may confirm the datatype by using the type() function in Python.

Zakariah Siyaji
  • 989
  • 8
  • 27