I'm having trouble rounding decimals while encoding in python

Question

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import OrdinalEncoder


df = pd.read_csv("mushrooms.csv",index_col=False,header=None)
def n(target):
   if target == 'p':
       return 1
   elif target == 'e':
       return 0

df[0] = df[0].apply(n)
#manually encoding the targets


targets = df[0]
inputs = df[df.columns[1:]]


def test_train_split(mydf, inputs, tratio, target):
   splitter = StratifiedShuffleSplit(n_splits = 1, test_size = tratio, random_state = 42)
   train_index, test_index = next(splitter.split(inputs, target))
   strat_train = mydf.iloc[train_index]
   strat_test = mydf.iloc[test_index]
   return strat_train, strat_test

def print_test_train_dfs(train_df, test_df, target_column = 'None'):
   print("\nTraining data:")
   train_df.info()
   if target_column != 'None':
       print(train_df[target_column].value_counts())
   print('\nTest data:')
   test_df.info()
   if target_column != 'None':
       print(test_df[target_column].value_counts())


traindf, testdf = test_train_split(df, inputs, 0.2, targets)

enc = OrdinalEncoder()
enc.fit(traindf)
df = enc.transform(testdf)
for i in range(len(df)):
   for j in range(len(df[1])):
       df[i][j].round(0)
df = pd.DataFrame.from_records(df)  
print(df)

df always ends up with decimals like 1.0 instead of just 1 which is what I want.

The dataset I'm using is here https://www.kaggle.com/uciml/mushroom-classification

I'll also add that after .transform, df is in more of an array than a dataframe

also, you almost never, ever, ever, ever need to loop through a data frame like this — Paul H, Dec 15 '19 at 21:50
What @PaulH said is important! If df is an array, change the variable name. Can you share more of your program, and some data? See: [mcve]. — AMC, Dec 15 '19 at 22:00
That `n` function, shouldn’t it use booleans? What are the possible values for that first column? Even better, can you share at least part of the data? — AMC, Dec 15 '19 at 22:42

score 0 · Accepted Answer · answered Dec 15 '19 at 22:17

0

df.astype(int) should load as integer

Refer to this question for more information Change data type of columns in Pandas

answered Dec 15 '19 at 22:17

Capie

976
1
8
20

I had to do df = df.astype(int) but yes, thank you. – Jasper Sands Dec 15 '19 at 22:32

I'm having trouble rounding decimals while encoding in python

1 Answers1