0

I am trying to run a correlation between specific elements from my variables in R, but I can't find a way to select the elements.

My data frame looks like this: [enter image description here][1] [1]: https://i.stack.imgur.com/DGZnu.png

And I'm trying to select "survived = 1", "sex=female", "pclass=2", and "age=10".

Any ideas about how I can get this to work?

ninna
  • 11
  • 2
  • Are you trying to subset? or trying to get correlation of subset columns? – zx8754 Apr 13 '21 at 13:48
  • Provide example data as text, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – zx8754 Apr 13 '21 at 13:49
  • @zx8754 I tried to subset but I am trying to find an easier way to get the correlation rather than creating a new data frame. If all else fails I guess I could go down that road – ninna Apr 13 '21 at 13:50
  • It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Apr 14 '21 at 00:06

1 Answers1

0

For the future, as @Ronak Shan said in the comment, create a minimum reproducible example (MRE) and post your failed attempts to help the community to help you !!!. I don't know if I've understood your question, but for subsetting data frame in r there are lots of methods. In base r the most straightforward methods are:

set.seed(55)#for reproducibility
#simulate the data.frame you posted
df <-
data.frame(
id = 1:10,
survived = sample(c(0, 1), 10, replace = T),
pclass = sample(c(1:3), 10, replace = T),
sex = sample(c("M", "F"), 10, replace = T),
age=round(runif(10,10,15)),
sibsp = sample(c(1:3), 10, replace = T),
parch=sample(c(0:2), 10, replace = T)
)

#subset
new.df <- subset(df,survived==1&sex=="F"&pclass==2&age==10)
#same thing
new.df2 <-df[df$survived==1&df$sex=="F"&df$pclass==2&df$age==10,]

As you can see create a new data frame is a very simple solution, then you can run a correlation analysis on the data you are interested in. From your question, it doesn't show exactly what you want to do, but a simple correlation can be computed as:

cor(new.df)

but this will throw an error because correlation can be computed only for numeric vector, matrix, or data frame.

new.df.for.cor <- new.df[,-4]#drop out the "sex" column
cor(new.df.for.cor)

For a better explanation, you have to edit the question to add more information on your data and the output you would like to get.

Elia
  • 2,210
  • 1
  • 6
  • 18