-2

I have a dataset which contains only int values and I would like to convert these int values to a range of [0,1]. I have two methods here which already work. However, I would like to try the one with pandas. Now my question how can I convert each column to a value range of [0,1] without violating the values, i.e. changing the value. For example, since the 3 stands for a class.

# My first try without pandas:


# Load a CSV file
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
         csv_reader = reader(file)
         for row in csv_reader:
             if not row:
                 continue
             dataset.append(row)
    return dataset

# Find the min and max values for each column
def dataset_minmax(dataset):
    minmax = list()
    stats = [[min(column), max(column)] for column in zip(*dataset)]
    return stats
 
# Rescale dataset columns to the range 0-1
def normalize_dataset(dataset, minmax):
    for row in dataset:
        for i in range(len(row)-1):
            row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
minmax = dataset_minmax(dataset)
normalize_dataset(dataset, minmax)

Dataframe:
import pandas as pd
d = {'int1': [1, 2, 1], 'int2': [3, 4, 5]}
# df = pd.read_csv('test.csv')
df = pd.DataFrame(data=d)

print(df['int1'].min()) # What I found, it give me the min and max back
print(df['int1'].max())
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Kazim
  • 175
  • 9
  • What do you mean "how can I convert each column to a value range of [0,1] without violating the values, i.e. changing the value"? 3 can't be turned into a value between 0 and 1 without changing the value of 3? Do you mean you want the normalization to be reversible? Something else? – Henry Ecker May 15 '21 at 11:52
  • Please add expected output to the question too. – Ch3steR May 15 '21 at 11:54
  • @HenryEcker Sorry that I expressed it wrong. Say, if I once the `int2` Consider, I have yes `3,4, and 5` we assume I have again the value `3` speak `3,4,5,3` then the result should look like this (exemplary!) -> `0,3, 0,4 0,5 0,3`. As you can see the `3` has the same value in the range `[0,1]` speak for the value `3` there are two times the same value, so not that you think you could do that randomly. This would be wrong `3,4,5,3 -> 0,3, 0,4 0,5 0,8` – Kazim May 15 '21 at 12:07
  • 1
    Correct me if I'm wrong but this seems like standard min-max normalization of a DataFrame. [Normalize columns of pandas data frame](https://stackoverflow.com/q/26414913/15497888) – Henry Ecker May 15 '21 at 12:13
  • @HenryEcker yes! That's what I was looking for, thank you! :) – Kazim May 15 '21 at 12:29

1 Answers1

2

This converts each Series into [0, 1] based on the max value in each Series:

import pandas as pd
d = {'int1': [1, 2, 1], 'int2': [3, 4, 5]}
df = pd.DataFrame(data=d)
df = df.apply(lambda col: col.div(col.max()))
#    int1  int2
# 0   0.5   0.6
# 1   1.0   0.8
# 2   0.5   1.0
Alex
  • 6,610
  • 3
  • 20
  • 38
  • 2
    i) You can do without `apply`ing: `df / df.max()`, ii) does this really guarantee `[0, 1]`? What if a column includes a negative value for example – Mustafa Aydın May 15 '21 at 11:45