32

I use the following method to calculate a correlation of my dataset:

cor( var1, var2, method = "method")

But I like to create a correlation matrix of 4 different variables. What's the easiest way to do this?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453

5 Answers5

41

Use the same function (cor) on a data frame, e.g.:

> cor(VADeaths)
             Rural Male Rural Female Urban Male Urban Female
Rural Male    1.0000000    0.9979869  0.9841907    0.9934646
Rural Female  0.9979869    1.0000000  0.9739053    0.9867310
Urban Male    0.9841907    0.9739053  1.0000000    0.9918262
Urban Female  0.9934646    0.9867310  0.9918262    1.0000000

Or, on a data frame also holding discrete variables, (also sometimes referred to as factors), try something like the following:

> cor(mtcars[,unlist(lapply(mtcars, is.numeric))])
            mpg        cyl       disp         hp        drat         wt        qsec         vs          am       gear        carb
mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684  0.68117191 -0.8676594  0.41868403  0.6640389  0.59983243  0.4802848 -0.55092507
cyl  -0.8521620  1.0000000  0.9020329  0.8324475 -0.69993811  0.7824958 -0.59124207 -0.8108118 -0.52260705 -0.4926866  0.52698829
disp -0.8475514  0.9020329  1.0000000  0.7909486 -0.71021393  0.8879799 -0.43369788 -0.7104159 -0.59122704 -0.5555692  0.39497686
hp   -0.7761684  0.8324475  0.7909486  1.0000000 -0.44875912  0.6587479 -0.70822339 -0.7230967 -0.24320426 -0.1257043  0.74981247
drat  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.00000000 -0.7124406  0.09120476  0.4402785  0.71271113  0.6996101 -0.09078980
wt   -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065  1.0000000 -0.17471588 -0.5549157 -0.69249526 -0.5832870  0.42760594
qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476 -0.1747159  1.00000000  0.7445354 -0.22986086 -0.2126822 -0.65624923
vs    0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846 -0.5549157  0.74453544  1.0000000  0.16834512  0.2060233 -0.56960714
am    0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113 -0.6924953 -0.22986086  0.1683451  1.00000000  0.7940588  0.05753435
gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013 -0.5832870 -0.21268223  0.2060233  0.79405876  1.0000000  0.27407284
carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980  0.4276059 -0.65624923 -0.5696071  0.05753435  0.2740728  1.00000000
daroczig
  • 28,004
  • 7
  • 90
  • 124
  • Since this graph is necessarily symmetric, it would be better to show the column in the listed order, and the row in the order: UF, UM, RF, RM. The graph then can be limited to those entries above and to the left of the X=Y line. You need only calculate 3/8ths the correlations listed above. – Slartibartfast Mar 27 '11 at 01:16
  • 2
    What graph? IF you mean the correlation matrix above, `cor` actually only computes the lower triangle then obtains the upper triangle by transposing, look at the source files:) – Sacha Epskamp Mar 27 '11 at 01:29
  • `lapply`ing a type check for a boolean filter. very sweet. – d8aninja Apr 10 '18 at 00:43
  • What if one wants to add the significance of the correlation coefficient of multiple variables? i.e. add stars to the `cor(mtcars[,unlist(lapply(mtcars, is.numeric))])` in your answer above – Jia Gao May 30 '18 at 02:07
  • Then I'd open a new question, @JasonGoal :) But see eg the correlation example at http://rapport-package.info/#Correlation – daroczig May 30 '18 at 08:01
22

If you would like to combine the matrix with some visualisations I can recommend (I am using the built in iris dataset):

library(psych)
pairs.panels(iris[1:4])  # select columns 1-4

enter image description here

The Performance Analytics basically does the same but includes significance indicators by default.

library(PerformanceAnalytics)
chart.Correlation(iris[1:4])

Correlation Chart

Or this nice and simple visualisation:

library(corrplot)
x <- cor(iris[1:4])
corrplot(x, type="upper", order="hclust")

corrplot

Bonono
  • 827
  • 1
  • 9
  • 18
dorien
  • 5,265
  • 10
  • 57
  • 116
  • What would you suggest if my variables are ordinal, with few levels? – skan Aug 02 '16 at 17:28
  • You can give them a numerical label (only if they are truely ordinal and the number would mean something). But I suspect they will be harder to correlate. – dorien Aug 02 '16 at 19:09
  • I mean the Pearson correlation woudn't be good, maybe spearman either. – skan Aug 02 '16 at 23:47
11

See corr.test function in psych package:

> corr.test(mtcars[1:4])
Call:corr.test(x = mtcars[1:4])
Correlation matrix 
       mpg   cyl  disp    hp
mpg   1.00 -0.85 -0.85 -0.78
cyl  -0.85  1.00  0.90  0.83
disp -0.85  0.90  1.00  0.79
hp   -0.78  0.83  0.79  1.00
Sample Size 
     mpg cyl disp hp
mpg   32  32   32 32
cyl   32  32   32 32
disp  32  32   32 32
hp    32  32   32 32
Probability value 
     mpg cyl disp hp
mpg    0   0    0  0
cyl    0   0    0  0
disp   0   0    0  0
hp     0   0    0  0

And yet another shameless self-advert: https://gist.github.com/887249

aL3xa
  • 35,415
  • 18
  • 79
  • 112
  • the pairs panels function will also give you the correlations, along with the distributions and regression lines of the variables. – richiemorrisroe Mar 27 '11 at 08:57
  • 1
    @richiemirrisroe, that's right, but `pairs` produces a graph, not a table. There's also `corrgram` package if you prefer graphical over tabular data summary. – aL3xa Mar 27 '11 at 12:39
4

You might want to look at Quick-R, which has a lot of nice little tutorials on how you can do basic statistics in R. For example on correlations:

http://www.statmethods.net/stats/correlations.html

Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
1

You can also calculate correlations for all variables but exclude selected ones, for example:

mtcars <- data.frame(mtcars)
# here we exclude gear and carb variables
cors <- cor(subset(mtcars, select = c(-gear,-carb)))

Also, to calculate correlation between each variable and one column you can use sapply()

# sapply effectively calls the corelation function for each column of mtcars and mtcars$mpg
cors2 <- sapply(mtcars, cor, y=mtcars$mpg)
USER_1
  • 2,409
  • 1
  • 28
  • 28