-2

I am new to python coding . I have written a python code to calculate Gaussian distribution and predict the label of the set of values. This was my class assignment for which I got good marks. Now I want to know if my code is correct in more python sense. Can I do any more improvement to the code and make it precise and more "Pythonic".

import math
import operator 
# Class to get the mean  and variance of all data point. Input paramaters 
#are the Labels (M or W) and parameter to calculate (height, weight, age). 
#samples are the number of data points
def getMean(trainingSet,parameter1,parameter2):
    mean =0 
    samples = 0 
    variance = 0
    for x in range(len(trainingSet)):
        if trainingSet[x][3]==parameter2:
            mean+= trainingSet[x][parameter1]
            samples = samples+1
    finalMean = mean/samples
    #print(finalMean)
    for x in range(len(trainingSet)):
        if trainingSet[x][3]==parameter2:
            variance+= (trainingSet[x][parameter1]-finalMean)**2
    finalVariance = variance/samples
    gausVal = []
    for x in range(len(trainingSet)):
        tempval = 
calculateGuassian(finalMean,finalVariance,trainingSet[x][parameter1])
        gausVal.append(tempval)
    return gausVal

#Class to calculate the gussaian distriubion points 

def calculateGuassian(meanVal, varianceVal, feature1):
    DenoVariance = 2*varianceVal
    func1 = 1/(math.sqrt(2*3.14*varianceVal))
    func2 = (-(feature1-meanVal)**2)/DenoVariance
    func3 = math.exp(func2)
    distro = func1*func3
    return distro

def finalProduct(multiplyer):
    result = 1
    for x in multiplyer: 
        result = result*x
    return result   

def arrayMultiply(arr1, arr2) :
    resultArray = []
    for x in range(len(arr1)):
        arrMul = arr1[x]*arr2[x]
        resultArray.append(arrMul)
    return resultArray  


# Main classes where every feature is calculated multiplied and the result 
#is shown

def main() :
    MenArr = []
    WomenList = []
    heightM = getMean(trainSet,0,'M')
    finalHM = finalProduct(heightM)
    MenArr.append(finalHM)
    heightW = getMean(trainSet,0, 'W')
    finalHW = finalProduct(heightW)
    WomenList.append(finalHW)
    weightM = getMean(trainSet,1,'M')
    finalWM = finalProduct(weightM)
    MenArr.append(finalWM)
    weightW = getMean(trainSet,1,'W')
    finalWW = finalProduct(weightW)
    WomenList.append(finalWW)
    ageM = getMean(trainSet,2,'M')
    finalAM = finalProduct(ageM)
    MenArr.append(finalAM)
    ageW = getMean(trainSet,2,'W')
    finalAW = finalProduct(ageW)
    WomenList.append(finalAW)
    BestResultMTemp = arrayMultiply(MenArr,testData)
    BestResultWTemp = arrayMultiply(WomenList,testData)
    BestResultM = finalProduct(BestResultMTemp)*0.50
    BestResultW = finalProduct(BestResultWTemp)*0.50
    print (BestResultM)
    print(BestResultW)
    if BestResultM<BestResultW :
        print("The Class Label Is W")
    if BestResultM>BestResultW :
            print("The Class Label Is M")


trainSet = [[170, 57, 32, 'W'],
[192, 95, 28, 'M'],
[150, 45, 30, 'W'],
[170, 65, 29, 'M'],
[175, 78, 35, 'M'],
[185, 90, 32, 'M'],
[170, 65, 28, 'W'],
[155, 48, 31, 'W'],
[160, 55, 30, 'W'],
[182, 80, 30, 'M'],
[175, 69, 28, 'W'],
[180, 80, 27, 'M'],
[160, 50, 31, 'W'],
[175, 72, 30, 'M']]     
testData = (175, 70, 35)        
main()

Any kind of suggestion is most welcome. Thank You in advance.

  • 4
    If you just want your code reviewed try: https://codereview.stackexchange.com/ – Peter Collingridge Feb 14 '19 at 21:01
  • Also, in reality, it's far more likely that you'd use something like `numpy` e.g. https://docs.scipy.org/doc/numpy-1.14.1/reference/generated/numpy.random.normal.html or `scipy` https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html – roganjosh Feb 14 '19 at 21:03
  • 1
    we where not allowed to use numpy – Sushant Kulkarni Feb 14 '19 at 21:05
  • @PeterCollingridge I also want to know if there are better functions to write gussian distribution – Sushant Kulkarni Feb 14 '19 at 21:05
  • The title is a little better now. Please also check the formatting of the comments... – NichtJens Feb 14 '19 at 21:18
  • 1
    Also, why are you importing operator? – NichtJens Feb 14 '19 at 21:20
  • If the post was updated to include sample input and expected output, then this post might be eligible for migration to CR – Sᴀᴍ Onᴇᴌᴀ Feb 14 '19 at 22:05
  • I also just realized, you are calling this a "class" in the header comment. There is no class defined anywhere here. – NichtJens Feb 15 '19 at 13:48
  • How does this post deserve a negative review. I am just asking a better way to implement the code – Sushant Kulkarni Feb 19 '19 at 18:18
  • I don't know, but it might help if you take the suggestions and answer the questions by Peter, Sᴀᴍ and me. I'd suggest you clean up the code according to pep8. This is a really low hanging fruit. Then formulate your question in a way that it can actually be answered. Be more concrete and explicit in what you want to know. If it's really a code review that you are after re-post to codereview.stackexchange.com – NichtJens Feb 20 '19 at 16:32
  • Besides, I actually ran your code and it does not do what it is supposed to do. I fear the math is wrong as it does not predict correctly at all. For the training data, where it should predict basically perfectly, I get only "M". If I use @PeterCollingridge suggestion to filter before the Gaussian, I get only "W". Therefore expected output for several inputs (as suggested by Sᴀᴍ) is needed! From my understanding, you should calculate the probability to be in a class for each feature and then use the product of the probabilities. But you are calculating a product between value and probability!? – NichtJens Feb 20 '19 at 16:43

2 Answers2

1

Your question's title does not reflect what you are actually asking. You might be better fitting to http://codereview.stackexchange.com/ with this...

But, on first glance:

NichtJens
  • 1,709
  • 19
  • 27
1

As NichtJens mentioned, try to follow the PEP8 guide, so use lower case variable names with underscores to separate words, and add more spaces between characters.

Also try to use consistent and meaningful variable names. For example, why do you have MenArr and WomenList? You have a variable mean but its value is a sum. You have temporary variables with names func1, func2, etc..

To make your for loops more Pythonic loop through the items in a list rather than create an index and then look up the items.

So:

gauss_value = []
for x in range(len(lst)):
    value = calculate_guassian(mean, variance, lst[x][parameter1])
    gauss_value.append(value)
return gauss_value

You can do:

gauss_value = []
for item in lst:
    value = calculate_guassian(mean, variance, item[parameter1])
    gauss_value.append(value)
return gauss_value

But even better you can use a list comprehension:

gauss_value = [calculate_guassian(mean, variance, item[parameter1]) for item in lst]

You can use this to simplify a lot of your code, e.g. arrayMultiply could be:

def list_multiply(list_1, list_2) :
    return [a * b for a, b in zip(list_1, list_2)]

My version of getMean would filter the data first. I'm not sure if it's correct to use the unfiltered data for the calculateGuassian part:

def get_mean(values, index, label):
    filtered_values = [value[index] for value in values if value[3] == label]
    n = len(filtered_values)

    mean = sum(filtered_values) / n
    summed_squared_difference = sum((val - mean) ** 2 for val in filtered_values)
    variance = summed_squared_difference / n

    return [calculateGuassian(mean, variance, item[index]) for item in values]

You can also greatly reduce the amount of code required to get up the initial lists:

men_values = [product(get_mean(trainSet, i, 'M')) for i in range(3)]
women_values = [product(get_mean(trainSet, i, 'W')) for i in range(3)]

You could reduce the repetition further by having a function that take 'M' or 'W' as a parameter and returns the relevant list.

Peter Collingridge
  • 10,849
  • 3
  • 44
  • 61
  • All very good points! However, I'd say NumPy should be used for `list_multiply()` etc. Even though he says he wasn't allowed to use NumPy during class, he also says asking this question was to go beyond the class work. – NichtJens Feb 15 '19 at 13:46
  • 1
    This is perfect. This was exactly what I was looking for. Now using these points I can code better in future assignments. Thank You – Sushant Kulkarni Feb 15 '19 at 16:04