-1

This is example of my data in mysql, I use lib flashext.mysql and python 3

RT      NK    NB    SU    SK    P    TNI IK   IB     TARGET
84876   902  1192  2098  3623  169   39  133  1063   94095
79194   902  1050  2109  3606  153   39  133   806   87992
75836   902  1060  1905  3166  161   39  133   785   83987
75571   902   112  1878  3190  158   39  133   635   82618
83797  1156   134  1900  3518  218   39  133   709   91604
91648  1291   127  2225  3596  249   39  133   659   99967

The formula MinMax is

(data-min)/(max-min)*0.8+0.1

I got the code normalize data from csv

import pandas as pd
df = pd.read_csv("dataset.csv")
norm = (df - df.min()) / (df.max() - df.min() )*0.8 + 0.1

I know how to count it like this

(first data of RT - min column RT data) / (max column RT- min column RT) * 0.8 + 0.1

So does the next column

(first data of NK - min column NK data) / (max column NK- min column NK) * 0.8 + 0.1

Please help me, How to normalize data from database, it call "dataset" and normalize it and input in another table call "normalize"

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Are you asking to turn your formulas into python or how to manage the database or how to manage the database via python (or something else)? – combinatorist Mar 10 '19 at 05:08
  • It looks like you have a valid solution in Python with Pandas. Are you trying to translate the same calculation into MySQL (perhaps run via python)? – combinatorist Mar 10 '19 at 05:09
  • before, I asked the same thing in another account. but it was from CSV which was then normalized and I got the formula and it worked (this is a formula) " import pandas as pd df = pd.read_csv("dataset.csv") norm = (df - df.min()) / (df.max() - df.min() )*0.8 + 0.1". and now I want to ask how to normalize data from a database not from CSV – LearnCoding Mar 10 '19 at 05:13
  • yes i mean like that, can you help me? – LearnCoding Mar 10 '19 at 05:14
  • Ok, so do you want to pull the data from MySQL and then normalize it in pandas or do you want to MySQL to do the normalization calculation? – combinatorist Mar 10 '19 at 05:15
  • I want to pull data from MySQL, name the tab "tb_dataset" and then normalize and input it back to another "tb_normalize" table – LearnCoding Mar 10 '19 at 05:18
  • It's also a little confusing because your python snippet seems to find the minimum (`df.min`) across all columns, but your english pseudocode formula below shows it separately per column. Which do you need? – combinatorist Mar 10 '19 at 05:18
  • this separately for each column, because I use "norm" this is for each column – LearnCoding Mar 10 '19 at 05:26
  • by the way, I have already posted more detailed questions – LearnCoding Mar 10 '19 at 06:07

1 Answers1

0

Here is a SQL query that should get you started (assuming you want to calculate it per column):

   create table normalize as
     select
       (RT - min(RT)over()) / (max(RT)over() - min(RT)over()) * 0.8 + 0.1 as RT_norm
     from test;

I tested this query in sqlite3, not MySQL. It isn't necessarily optimal, but intuitively follows the formula. Notice, the over turns the min / max aggregate functions into window functions, which means they look at whole column, but the result is repeated on each row.

Todo

You would still need to:

  • send the MySQl query via Python
  • repeat the same code for each column
  • give each column a name
  • assign the resulting table to a schema (most likely)
  • handle divide by 0 in case a column max and min are equal
combinatorist
  • 562
  • 1
  • 4
  • 17