5

I am trying to run an iterative for loop to calculate correlations for levels of a factor variable. I have 16 rows of data for each of 32 teams in my data set. I want to correlate year with points for each of the teams individually. I can do this one by one but want to get better at looping.

correlate <- data %>%
  select(Team, Year, Points_Game) %>% 
  filter(Team == "ARI") %>% 
  select(Year, Points_Game)

cor(correlate)

I made an object "teams" by:

teams <- levels(data$Team)

A little help in using [i] to iterate over all 32 teams to get each teams correlation of year and points would be greatly helpful!

DanY
  • 5,920
  • 1
  • 13
  • 33
Jeff Henderson
  • 643
  • 6
  • 10
  • Please share sample of your data using `dput()` (not `str` or `head` or picture/screenshot) so others can help. See more here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – Tung Aug 07 '18 at 16:33
  • I'm trying to figure out how to properly use dput. It's too long to post in the question...sorry I'm new to this. – Jeff Henderson Aug 07 '18 at 16:52

2 Answers2

6
require(dplyr)

# dummy data
data = data.frame(
  Team = sapply(1:32, function(x) paste0("T", x)),
  Year = rep(c(2000:2009), 32),
  Points_Game = rnorm(320, 100, 10)
)

# find correlation of Year and Points_Game for each team
# r - correlation coefficient
correlate <- data %>%
                group_by(Team) %>% 
                summarise(r = cor(Year, Points_Game))
Aleksandr
  • 1,814
  • 11
  • 19
3

The data.table way:

library(data.table)

# dummy data (same as @Aleksandr's)
dat <- data.table(
  Team = sapply(1:32, function(x) paste0("T", x)),
  Year = rep(c(2000:2009), 32),
  Points_Game = rnorm(320, 100, 10)
)

# find correlation of Year and Points_Game for each Team
result <- dat[ , .(r = cor(Year, Points_Game)), by = Team]
DanY
  • 5,920
  • 1
  • 13
  • 33
  • Is there a way to do this and produce p-values? Also, can this be put into a correlation matrix? – Con Des Sep 06 '21 at 09:20
  • @ConDes - to your first question: yes. The t-statistic for the Pearson correlation coefficient can be computed as t_{n-2} = r / sqrt{ (1-r^2) / (n-2)} where r is a correlation and n is the number of observations used. Given the t-statistic, you can find the p-value using `pt()`. – DanY Sep 08 '21 at 03:42
  • @ConDes - to your second question: I don't understand. What defines each row and column of the matrix you imagine? – DanY Sep 08 '21 at 03:43
  • I have 3 variables that I want to correlate. But I've been asked to do this separately for different factor levels (17 x 3) to be precise. It need not all be one matrix but I don't want to have to split the data into 51 dataframes to be able to run the correlations. Does that make sense? Sorry I didn't see your response until now. I can get the correlations using dplyr summarise but no p-values and it's in long format. – Con Des Sep 08 '21 at 08:33
  • @ConDes pairwise correlations for 3 variables (say A, B, and C) is just 3 total correlations: AB, AC, and BC. So just do the above thing three times. If this isn't clear, feel free to post a new question on SO (with example data and your specific coding problems), tag me, and I'll be sure to help. Good luck! – DanY Sep 09 '21 at 04:01