6

I am trying to compute a partial correlation in R. I have the two data sets that I want to compare and currently only one controlled variable. (This will change in the future)

I have looked online to try to work this out myself but it is difficult to understand the terminology used on the websites I have looked at. Can someone please explain how I would go about doing this and perhaps provide a simple example?

Data is in the following form:

                Project.Name Bugs.Project Changes.Project Orgs.Project
1     platform_external_svox            4             161            2
3 platform_packages_apps_Nfc           13             223            2
5      platform_system_media           36             307            2
7     platform_external_mtpd            2              30            2
9            platform_bionic           42            1061            4

I want the correlation between Bugs.Project and Orgs.Project with Changes.Project as a controlled variable. I have downloaded the ppcor library since it looks like it has the functionality that I need. I am unsure how to use it, however. How do I add my data to a matrix and use the pcor function?

This is what I've been trying:

y.data <- data.frame(
bpp=c(projRelateBugsOrgs[2]),
opp=c(projRelateBugsOrgs[4]),
cpp=c(projRelateBugsOrgs[3])
)

test <- pcor(y.data)

I just used an example I found and tried to use my data in place of theirs. I don't understand my output.

It looks like this:

$estimate
                Bugs.Project Orgs.Project Changes.Project
Bugs.Project       1.0000000    0.3935535       0.9749296
Orgs.Project       0.3935535    1.0000000      -0.1800788
Changes.Project    0.9749296   -0.1800788       1.0000000

$p.value
                Bugs.Project Orgs.Project Changes.Project
Bugs.Project     0.00000e+00  2.09795e-07       0.0000000
Orgs.Project     2.09795e-07  0.00000e+00       0.0264442
Changes.Project  0.00000e+00  2.64442e-02       0.0000000

$statistic
                Bugs.Project Orgs.Project Changes.Project
Bugs.Project        0.000000     5.190442       53.122165
Orgs.Project        5.190442     0.000000       -2.219625
Changes.Project    53.122165    -2.219625        0.000000

$n
[1] 150

$gp
[1] 1

$method
[1] "pearson"

I think I want something from the $estimate table but I'm not exactly sure what it's giving me,

user1897691
  • 2,331
  • 3
  • 14
  • 12
  • Do I get a reason why this was down-voted? I can provide information if it's needed. Tell me what you need. – user1897691 Jan 10 '13 at 05:03
  • Maybe look here. http://r.789695.n4.nabble.com/Partial-correlations-and-p-values-td908641.html You probably got down voted for not providing any example data and not providing any code that suggests you tried to answer the question yourself. If I were trying to do a partial correlation in R I would look up an example in almost any statistics book so I knew the answer and then attempt to write code for it and post that code here if I needed help. (I did not down-vote you. I only have 2 down votes to my credit in 11 months.) – Mark Miller Jan 10 '13 at 05:07
  • I would go further to add that it isn't clear you know what you want to do. As it stands it will closed quickly as not a real question. Read [this](http://stackoverflow.com/questions/how-to-ask) for general tips and [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for specific tips on how ask good `R` questions – mnel Jan 10 '13 at 05:11
  • Can you post you output and describe what you don't understand? – mnel Jan 10 '13 at 05:27
  • It would really help if you actually provided a small set of example data, as suggested earlier. In my experience people here would be far more likely to try to help and less likely to vote to close your post. – Mark Miller Jan 10 '13 at 05:27
  • In the event that your post is closed soon, try obtaining a copy of Crawley's 'The R Book'. There surely must be R code in that book for partial correlation. – Mark Miller Jan 10 '13 at 05:33
  • I've added sample data, what I'm trying, and the output that is confusing me. – user1897691 Jan 10 '13 at 05:34
  • I'm thinking it's a matrix with the correlations of the row and column that the element is in. Is this correct? For example, is the correlation between bugs and organizations with the controlled variable of changes 0.3935535? – user1897691 Jan 10 '13 at 05:36

1 Answers1

10

Reading from help('pcor') in the value section

Value

estimate a matrix of the partial correlation coefficient between two variables

p.value a matrix of the p value of the test

statistic a matrix of the value of the test statistic

n the number of samples

gn the number of given variables

method the correlation method used

The details section gives

Details

Partial correlation is the correlation of two variables while controlling for a third or more other variables.

For your result

$estimate
                Bugs.Project Orgs.Project Changes.Project
Bugs.Project       1.0000000    0.3935535       0.9749296
Orgs.Project       0.3935535    1.0000000      -0.1800788
Changes.Project    0.9749296   -0.1800788       1.0000000

The partial correlation of Changes.Project and Orgs.Project is -0.1800788. This is the correlation of Changes.Project and Orgs.Project controlling for Bugs.Project

The partial correlation of Changes.Project and Bugs.Project is 0.9747296. This is the correlation of Changes.Project and Bugs.Project controlling for Orgs.Project

The partial correlation of Orgs.Project and Bugs.Project is 0.3935535. This is the correlation of Orgs.Project and Bugs.Project controlling for Changes.Project

You could get same information (if you are only interested in this third case) from

pcor.test(y.data$Orgs.Project, y.data$Bugs.Project, y.data$Changes.Project)
mnel
  • 113,303
  • 27
  • 265
  • 254
  • First off, thank you for suggesting the help manual. I didn't know that was in R. Also, what I'm attempting is to find the correlation between the number of bugs in a project and the number of organizations that worked on that project. The number of changes on a project is a factor to consider since more changes usually means more bugs. What I mean by controlled variable is just that it needs to be taken into account but it's not what I'm looking for, if that makes sense? – user1897691 Jan 10 '13 at 05:56
  • I worked that out in the end. Hopefully my edit is now clearer. – mnel Jan 10 '13 at 05:58
  • 1
    @mnel sorry to add to a closed question but this is just a quick addition: I am also using pcor & pcor.test - when I use just one controlling variable I get the same correlation values from pcor and pcor.test, but then I use two controlling variables by saying pcor.test(x,y,z=c(z1,z2)) which gives a different result to the pcor output - could you possibly explain this please? is it wrong to use the pcor.test function for more than one controlling variable? – rg255 Mar 06 '13 at 10:42
  • It can be useful to compare the output of this with similar results for the zero-order (i.e., non-partial) correlations with, e.g., `Hmisc::rcorr(y.data)` – wes Jan 02 '23 at 21:05