1

Attempting to derive the mean, median and mode from the dataframe. I need to know how to code the source in the function instead of ":".

source = [df.'DMC]

import pandas as pd
import nltk

df.head(4)
# This is the print out of the dataframe 
# When I came up with this code, the source was
# source=[3,4,6,4,7,2,6,7,...]
# But now I need to get the data from a dataFrame. 
#   X   Y   month   day   DMC    RH
# 0 7   5   3       fri   26.2   94.3
# 1 7   4   10      tue   90.6   35.4
# 2 6   6   12      mon   56.8   99.2
# this is just a sample

#This is the code to find the mean median and mode

source = [df:'DMC']  #This is were I need your help.
def meanmedianmode (source):
    mmm = {'mean': Mean(source), 'median': Median(source), 'mode':
            Mode(source) }
def Mean (source):
    mean = reduce(lambda x,y: x+y, numbers)/len(source)
    return mean

def Median(source):
    median = numpy.median(source)
    return(median)

def Mode (source):
    mode = statistics.mode(source)
    return mode
    return mmm
print("mean median mode" + str(meanmedianmode(source)))
Anubhav Singh
  • 8,321
  • 4
  • 25
  • 43
2tan2ten
  • 109
  • 1
  • 9

1 Answers1

0

To answer your specific question, in order to select a specific column of a pandas dataframe, you can either use the syntax

source = df.DMC 

or

source = df['DMC']

However, you don't have to go to the trouble of implementing your own functions for finding mean, median and mode. Thankfully pandas already include functions for all three of them. Check computations/descriptive stats under pandas documentation. The solution is as simple as

In [6]: df = pd.DataFrame({'X':[7,7,6], 'DMC':[26.2, 90.6, 56.8]})

In [7]: df
Out[7]:
    DMC  X
0  26.2  7
1  90.6  7
2  56.8  6

In [8]: df.DMC.mean()
Out[8]: 57.86666666666667

In [9]: df.DMC.median()
Out[9]: 56.8

In [10]: df.DMC.mode()
Out[10]:
0    26.2
1    56.8
2    90.6
dtype: float64
Unni
  • 5,348
  • 6
  • 36
  • 55
  • Thank you Unni for the quick and correct response in both syntax's. Although previously when using source=[3,4,6,4,7,2,6,7,...], the output was mean median mode{'mean': 7.533333333333333, 'median': 8.0, 'mode': 11}. Now it has an unusual output: "mean medianmodeNone". None, must be saying that there is no real mean medium or mode output, just the text. Using import pandas, import statistics, and import numpy. Can you help me or should I start a new question on stackoverflow? – 2tan2ten Jun 10 '19 at 23:43
  • How are you using mean and median? This should work `mmm = {'mean': df.DMC.mean(), 'median': df.DMC.median()}`. Mode is tricky since there could be multiple values. You should think about how you want to include that in your solution. – Unni Jun 11 '19 at 00:12
  • Thank you Unni. I got the same results with the adjusted code. "meanmedianmodeNone". I deleted the "source = df1['DMC']" statement, and got the same results None, "meanmedianmodeNone". Sounds like the data is not getting in the code. – 2tan2ten Jun 11 '19 at 00:23
  • I don't quite get the issue. Would you mind updating this question or starting a new one if you think it is different from this? – Unni Jun 11 '19 at 02:02
  • Unni, Thank you for all your effort. I will start a new question. – 2tan2ten Jun 11 '19 at 02:20
  • @2tan2ten: I just noticed how you are printing them. You are not returning `mmm` from `meanmedianmode` function – Unni Jun 11 '19 at 08:19