0

I made an R object whilst reading an xlsx file, like so -

 a = read.xlsx("Global_Manifest.xlsx", sheetIndex=1, colName=T) 
 a <- a[a$visit.1=="SCR" & a$processed.data.available == 1,]
 a$sampleName <- paste(a$best.response,a$subject,a$visit.1,"VAF=",a$AF)

The result of a$sampleName is something like this "TM 700-666 SCR VAF= 0.46"

However once my analysis is done, I would like to match back the a$sampleName to get a$gender information for each of the results. Gender is one of the header in the Global_Manifest.xlsx file.

The idea is to visualise the results by stacked barplot to see difference between the results profile in men and women subjects.

Could anyway suggest an easy way to split the a$sampleName object and match with the a$best.response,a$subject,a$visit.1,a$AF and if all matches, get the a$gender.

user44552
  • 153
  • 1
  • 10
  • 1
    When asking for help, you should include a [reproduicble example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data in the question iteself. This makes it easier to help you. It's unclear why you think you need to deparse the sampleName value when you have all the other columns in `a` already. What exactly is the next step you want to carry out. Give the desired output. – MrFlick Mar 08 '17 at 21:24

1 Answers1

1

Without data, I cannot test this to see if I am giving you what I believe you are asking for. However, this should work:

After you create the new column, using your code above, make sure that you keep all of the old columns together. Then create a table using dplyr, group_by:

b<- group_by(a , best.response, subject, visit, AF, VAF, gender)

This will give you a table with all of the 5 columns you want to evaluate together, and separate them into male and female. Once you have that table, you can work with it like any other datatable.

If you de-duplicate b you will have a single row of each VAF and gender, you can just use the subset again:

b<-unique(b)
b[,c("VAF","gender")]

This should return a subset with just the VAF compound key you created and the gender associated with it. If you actually want to count how many of each there are, instead of taking the uniques and then subsetting pipe the group_by statement into a summarize().

b<- group_by(a , best.response, subject, visit, AF, VAF, gender)%>%
summarize(count=n())
sconfluentus
  • 4,693
  • 1
  • 21
  • 40