How to use vectors to find mean and standard deviation

Question

This is the assignment:

Write two functions that calculate the mean and standard deviation. Your functions shall implement the following prototypes:
double mean(vector<double>x);
double sd(vector<double>x);
b. Place these functions in a file named “statfun.cpp”.

c. Place their function definitions in a file named “statfun.h”.

Write a main() function in a file named “lab1.cpp”.

Prompt the user to input 10 floating-point values and store them in a vector v.

Print vector v on a single line with each element separated by a space.

Call your functions mean(v) and sd(v) ...

I know how to code the formula for mean, but I'm not sure how to code the formula for standard deviation using vectors. I'm even less sure of how to do this with different files involved. I'm fully aware my code is garbage, but there are so many things I'm not sure of, I don't know what to tackle first.

Edit: Updated the code

//statfun.h
#include <iostream> 
#include <vector>
#ifndef STATFUN_H
#define STATFUN_H
using namespace std;

double mean(vector<double> v);

double sd(vector<double> v);

#endif

//statfun.cpp
#include <iostream>
#include <cmath>
#include <vector>
#include "statfun.h"
#ifndef STATFUN_CPP
#define STATFUN_CPP
using namespace std;

double mean(const vector<double> v) {

    double result;

    double sumVal = 0.0;            //Calculating sum of all values
for (int i = 0; i < v.size(); ++i) {
    sumVal = sumVal + v.at(i);
}

result = sumVal / v.size(); //Calculating mean

return result; 
}

double sd(const vector<double> v) {
    double total = 0;
    for (int i = 0; i < 10; ++i) {      //Calcuating standard deviation
        double mean_value = mean(v);
        int length = v.size()
        total = total + (val - mean_value)*(val - mean_value);
}

    return sqrt(total / length);
}

#endif 

//lab1.cpp
#include "statfun.cpp"
#include <iomanip>
using namespace std;

vector<double> v;

int main() {

    cout << "Enter 10 numbers: " << endl;

    float userInput = 0;
    for (int i = 0; i < 10; ++i) {
        cin >> userInput;
        v.push_back(userInput);
    }

    for (int i = 0; i < 10; ++i) {
        cout << v.at(i) << " ";
    }

    cout << endl;

cout.precision(3);
cout << mean(v) << " " << sd(v) << endl;

cout.precision(5);
cout << scientific << mean(v) << " " << sd(v) << endl;

    return 0;
}

Welcome to Stack Overflow! It looks like you're trying to program C++ in a functional style, but the syntaxes are completely different. I recommend you read a bit about [function definitions in C++](https://www.tutorialspoint.com/cplusplus/cpp_functions.htm) before getting into [how header files work](https://www.learncpp.com/cpp-tutorial/header-files/). — Niayesh Isky, Mar 24 '19 at 04:12
... and then, [completely forget that `using namespace std;` exists in C++](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice), or risk wasting time on mysterious compilation errors. — Sam Varshavchik, Mar 24 '19 at 04:20
Welcome to Stackoverflow! I recommand you to read [range for](https://en.cppreference.com/w/cpp/language/range-for), [std algorithm library](https://en.cppreference.com/w/cpp/algorithm) and [range-v3](https://github.com/ericniebler/range-v3). Then you will have a good idea of how to do these. — JiaHao Xu, Mar 24 '19 at 04:21
You may also want to look at [c++ language](https://en.cppreference.com/w/cpp/language) to learn some basic syntax of `c++`, like what is declaration and definition and the difference between then, how to define/declare a function, what is a reference to a variable, how to write a header, etc. Learn things from bottom, don't try to make 2 knives the first time you use `c++` or any other language. — JiaHao Xu, Mar 24 '19 at 04:34
In my class, we are only allowed to code using namespace std, because we haven't "learned" how to do it any other way. We haven't done anything with range loops either, and I don't think we ever will, as we are moving onto MATLAB soon. — I'mReallyTired, Mar 24 '19 at 16:35

JiaHao Xu · Answer 1 · 2019-03-24T06:25:15.177

You made many mistakes and your code has much to improve.

Let me show you me by me.

The header

Since one file can include header multiple times, to prevent any side effect of this, an include guard is required for each header file.

// statfun.h
#ifndef  __statfun_H__
# define __statfun_H__

# include <vector>

double mean(const std::vector<double>&);
double sd(const std::vector<double>&);

#endif

BTW, a function declaration can abbreviate the arguments' name.

Reference

The second mistake you made is that you didn't use reference. In c++, an object is by default passed by value.

Note: This is just like R, except it doesn't have language level copy-on-write semantics, but user-defined class can implement this, classes defined in std namespace can also implement this.

So in order to prevent costy copy, reference is made.

double mean(const std::vector<double>&);

Here I used const left-value reference (const &), since mean will not modify the vector passed in.

Function blocks.

In c++, a function is defined as below:

return_value func_name(type1 arg1 /* , type2 arg2, ... */)
{
    // The function body goes here:
}

So

// statfun.cpp
// c++11
#include "statfun.h"
#include <cmath>

double mean(const std::vector<double> &v)
{
    double sum = 0;

    for (auto &each: v)
        sum += each;

    return sum / v.size();
}

double sd(const std::vector<double> &v)
{
    double square_sum_of_difference = 0;
    double mean_var = mean(v);
    auto len = v.size();

    double tmp;
    for (auto &each: v) {
        tmp = each - mean_var;
        square_sum_of_difference += tmp * tmp;
    }

    return std::sqrt(square_sum_of_difference / (len - 1));
}

Compile-time variable type deduction

As you might have noticed in the code above, I used auto len = v.size(), which is a c++11 language feature -- auto.

Since c++11, c++ can deduce the return type of function calls at compile-time. So instead of define variable like typename std::vector<double>::size_type len = v.size(), we now have auto len = v.size().

range-for loop

If you have learnt python, then you must know range-for. Since c++11, c++ can also do this:

for (auto &each: v) {
    // Loop body
}

where v can be std::vector or any other container in c++.

IO error check

Last but not least, you didn't check if any of these IO you performed on std::cout or std::cin succeeds or not!

Using std::cout or std::cin, you have to check stream state by std::cout.fail() every time after you performed an IO, or use the following code:

std::cout.exceptions(std::ios_base::failbit | std::ios_base::badbit);
std::cin.exceptions(std::ios_base::failbit | std::ios_base::badbit);

To make std::cout and std::cin throws when an IO fails.

I personally like to not handle this error and let the exception terminates the program, since there is nothing you can do to cleanup and resume the control flow of the program.

Below is the last piece of code:

// lab1.cpp
// c++11
#include "statfun.h"
#include <iostream>

auto get_use_input() -> std::vector<double>
{
    std::vector<double> v;
    v.reserve(10);

    double userInput;
    for (int i = 0; i != 10; ++i) {
        std::cout << "Please enter the " << i + 1 << " number: ";
        std::cin >> userInput;
        std::cout << std::endl;

        v.push_back(userInput);
    }

    return v;
}

void print_vec(const std::vector<double> &v)
{
    std::cout << "Vector: ";

    for (auto &each: v)
        std::cout << each << " ";

    std::cout << std::endl;
}

int main() {
    // Configure std::cout and std::cin to throw if io fails.
    std::cout.exceptions(std::ios_base::failbit | std::ios_base::badbit);
    std::cin.exceptions(std::ios_base::failbit | std::ios_base::badbit);

    /*
     * With "-O3" or [c++17, copy elision](https://en.cppreference.com/w/cpp/language/copy_elision), 
     * the cost of initializing an object using the return value of anther function is nearly zero.
     */
    std::vector<double> v = get_use_input();

    print_vec(v);

    std::cout.precision(3);
    std::cout << "mean: " << mean(v) << " sd: " << sd(v) << std::endl;

    std::cout.precision(5);
    std::cout <<std::scientific << "mean: " << mean(v) << " sd: " << sd(v) << std::endl;

    return 0;
}

To build this program, you must have a c++ compiler that supports c++11 and pass -std=c++11 to the compiler.

PS: You can also use -std=c++14 or -std=c++17.

A simple Makefile to build the program:

cxx = ${CXX}
# The flags 
CPPFLAGS := -std=c++11

# The two line below is the flags I used for clang++-8
# CPPFLAGS := -std=c++17 -Ofast -pipe -flto
# LDFLAGS := -flto -pipe -Wl,--icf=all,-O2,--as-needed,--strip-all,--plugin-opt=O3

lab1: lab1.o statfun.o
    $(CXX) $(LDFLAGS) $^ -o $@

statfun.o: statfun.h
lab1.o: statfun.h

.PHONY: clean
    rm -f lab1.o statfun.o lab

jsonV · Answer 2 · 2019-03-24T06:00:36.250

I believe your first issue is in understanding the file structure of your stats assignment. Tackle this first. Understanding headers. More on headers and function calls from other files.

The .cpp files will contain implementation of logic, the .h files are headers that should declare definitions of objects and functions. When you include files at the top of your code, generally think of this as having all the code from that file above the current file.

Example:

statfun.h

double mean(vector<double> v);
// other **declaration** stuff....

lab1.cpp at the top of the file

#include "statfun.h" // equivalent to copy/pasting 'double mean(vector<double> v); and other declarations' into your lab1.cpp
// This is to help with cleanliness of your file structure. 
// You'll thank yourself when projects become bigger.

Note: lab1.cpp includes statfun.cpp which includes statfun.h; implicitly, lab1.cpp includes statfun.h which means you don't have to include statfun.h in lab1, although typically the header is included, not the cpp file. You must avoid circular dependencies which you do with the ifndef.

b. statfun.cpp should be the place where you code all of your logic for the mean and standard deviation. example:

statfun.cpp

double mean(vector<double> v) {
    // Your mean calculation logic here.
    return mean;
}

double sd(vector<double> x) {
    // Your standard deviation calculation logic here.
    return sd;
}

c.

So you have lab1.cpp which will be compiled to produce some runnable binary. As the entry point of your program, it should include an int main() function. This main function needs to ask for user input (search the webs for how to take std input).
Store the standard input as a vector (this is still in your main function).
Use cout to print to standard out. 'cout << name_of_variable_with_vector_input_from_user;' (still in your int main())
Call/use the functions you wrote in statfun.cpp (notably mean() and sd()). Maybe store their return values in a variable to use later. Since you need to call the statfun functions here, the lab1.cpp entry file must include statfun.h so that it knows what code to execute when you call those functions.

Now that this file structure logic is complete. A simple way to calculate std deviation in pseudocode:

statfun.madeuplanguage

type sd(vector<type> values) {
    type total = 0;
    type mean_value = mean(values);
    for val in values {
        total += (val - mean_value)^2;
    }
    total /= values.length;
    return sqrt(total);
}

This in mind, I would structure the lab1.cpp as follows.

lab1.cpp

int main() {
    vector<double> v;
    // take input and store in v.
    // std out - v

    double mean_val = mean(v);    
    double std_dev = sd(v);

    // std out - mean_val and std_dev
}

If you have any questions about implementing the above pseudocode in C++, great! It's your assignment/class, so take care to search the webs in doing extremely specific things in C++ (e.g. iterating on a vector, squaring, square rooting, etc...). Good luck learning.

You're missing squaring the difference, which the OP actually has correct. He was just missing the summation step. — Ben Voigt, Mar 24 '19 at 05:19
Thanks, edited that in. I also seemed to have missed using mean_value within the summation step. I've added some additional information to hopefully clarify things. — jsonV, Mar 24 '19 at 05:40

How to use vectors to find mean and standard deviation

2 Answers2