1

This is the original matrix if it helps. I am trying to do part A.

enter image description here

I am completely new to R, like 6 days new. Our instructor just gave us these questions with almost no R explanation. Is there a way to fill the matrix in the image in a more of an qutomatic way, rather than doing everything one by one. So far all I did is:

x_and_y_val <- matrix(c(3,4,2,2,1,3,-1,1,0, 0),ncol=2,byrow=TRUE)
colnames(x_and_y_val) <- c("x","y")
rownames(x_and_y_val) <- c("","", "", "", "")
x_and_y_val <- as.table(x_and_y_val)
# original matrix with x and y
x_and_y_val

#find the averages of Xs and Ys
x_avg <-
y_avg <-

I will then find their averages complute all the values and eventaully create a new matrix of size 5x6. But I have a very strong feeling that there can be a better more efficient way. If you know how, can you explain it to me. Or direct me to a source that does?

Stay safe and Healthy!

jay.sf
  • 60,139
  • 8
  • 53
  • 110

3 Answers3

0

Beginning to learn R can feel daunting for sure! There are two popular schools of learning R: base and tidyverse. There is too much to cover about the differences between base and tidy verse in a SO post, so I'll put a link to Hadley Wickham's textbook on writing R tidyverse code. I'll provide both solutions here:

Base:

To make the data frame, you can use this:

df <- data.frame(
  x = c(3,2,1,-1,0),
  y = c(4,3,2,1,0)
)

   x y
1  3 4
2  2 3
3  1 2
4 -1 1
5  0 0

Then to get the averages, you can use this:

x_avg <- mean(x)
y_avg <- mean(y) 

Tidyverse:

To make the data frame, you can use this:

df <- tibble(
  x = c(3,2,1,-1,0),
  y = c(4,3,2,1,0)
)

# A tibble: 5 x 2
      x     y
  <dbl> <dbl>
1     3     4
2     2     3
3     1     2
4    -1     1
5     0     0

Then to get the averages, you can use this:

df %>%
  summarize(across(everything(), 
                   .fns = mean,
                   .names = "mean_{.col}"))

# A tibble: 1 x 2
  mean_x mean_y
   <dbl>  <dbl>
1      1      2
latlio
  • 1,567
  • 7
  • 15
  • Thank you so much. This is very helpful. I think I'll eventually have to learn tidyverse but at the moment base feels more natural. Whoever you are, I hope you have a great day or night depending on the time. – Yüksel Polat Akbıyık Jan 10 '21 at 17:32
  • No worries, if this answer helped, please feel free to accept it! – latlio Jan 10 '21 at 17:38
0

I think you may want to stick with matrices, which is perfectly fine (and faster) in your case, because you calculate numbers. data.frames are needed if you also have strings and other stuff in your data.

I would build up the initial matrix column-wise and use the dimnames= argument. dimnames= wants a list of row and column names. Since we don't really need row names here we may use NULL to use the default. For the column names we should use valid names.

m <- matrix(c(3, 2, 1, -2, 0,
              4, 2, 3, 1, 0), ncol=2, dimnames=list(NULL, c("x", "y")))

m
#       x y
# [1,]  3 4
# [2,]  2 2
# [3,]  1 3
# [4,] -2 1
# [5,]  0 0

Based on the initial matrix we can do calculations, and add them as columns using cbind.

x.dm <- m[,"x"] - mean(m[,"x"])  ## "dm" because of de-meaned
x.dm.sq=x.dm^2
y.dm <- m[,"y"] - mean(m[,"y"])
y.dm.sq=y.dm^2
p.dm=x.dm.sq*y.dm.sq
m <- cbind(m, x.dm, x.dm.sq, y.dm, y.dm.sq, p.dm)
m
#       x y x.dm x.dm.sq y.dm y.dm.sq  p.dm
# [1,]  3 4  2.2    4.84    2       4 19.36
# [2,]  2 2  1.2    1.44    0       0  0.00
# [3,]  1 3  0.2    0.04    1       1  0.04
# [4,] -2 1 -2.8    7.84   -1       1  7.84
# [5,]  0 0 -0.8    0.64   -2       4  2.56

For the column sums we may use colSums.

colSums(m)
#            x            y         x.dm      x.dm.sq         y.dm      y.dm.sq         p.dm 
# 4.000000e+00 1.000000e+01 2.220446e-16 1.480000e+01 0.000000e+00 1.000000e+01 2.980000e+01 

Note: The weird notation comes from the very small value in x.dm which is probably due to rounding issues. See explanation. zapsmall will convert them to zero.

zapsmall(colSums(m))
#   x       y    x.dm x.dm.sq    y.dm y.dm.sq    p.dm 
# 4.0    10.0     0.0    14.8     0.0    10.0    29.8 
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

Here's another version (more simple for a beginner IMHO):

#impute base vectors x and y
x <- c(3,2,1,-1,0)
y <- c(4,2,3,1,0)

#calculate their means
x_mean <- mean(x)
y_mean <- mean(y)

#determine the remaining columns of the table
x-x_mean
(x-x_mean)^2
y-y_mean
(x-x_mean)*(y-y_mean)

#summarize
table <- cbind(x,
      y,
      x-x_mean,
      (x-x_mean)^2,
      y-y_mean,
      (x-x_mean)*(y-y_mean)
      )
#specify the header
colnames(table) <- c("x","y","x-x_mean","(x-x_mean)^2","y-y_mean","(x-x_mean)(y-y_mean)")

table

      x y x-x_mean (x-x_mean)^2 y-y_mean (x-x_mean)(y-y_mean)
[1,]  3 4        2            4        2                    4
[2,]  2 2        1            1        0                    0
[3,]  1 3        0            0        1                    0
[4,] -1 1       -2            4       -1                    2
[5,]  0 0       -1            1       -2                    2
Alessio
  • 910
  • 7
  • 16