0

we were given 2 data frames to import

1 contains a list of gene expression data for 17 patients (non - numerical)

the second one contains their gene ID and their treatment group

these data sets have to firstly be combined

and then we have to calculate the mean expression value for each treatment group

im struggling to work out how to calculate the mean and assosiate it to a certain treatment group

apologies if this does not make sense

patients<-read.table("GSE4922-GPL96_log2Mas5Sc500-N17.tab",sep = "\t", header=TRUE)
patients
attach(patients)
ProbeSetID


patientpID<-read.table("Patient-Groups-N17.tab", sep = "\t",header=TRUE)
patientpID
attach(patientpID)
PatientID

mergeddata<-merge(patientpID,patients)

grouping(TreatmentGroup)
    sum(avg_pID = mean("ProbeSetID"))

this is what I have so far, but I need to find the mean of the Probe Set ID and the group it into a treatment group

  • Hi Jill, can you show us some example data? So we have a guess of the data structures of your `patients`and `patientID` ? – Stephan Nov 03 '22 at 13:04
  • 3
    My guess is this is a dupe of [summarize by group](https://stackoverflow.com/q/11562656/3358272). Some advice: `attach` is discouraged in most (every?) reputable tutorial and advanced usage of R, it can easily get you into trouble (despite its apparent convenience); I recommend getting used to `patients$ProbeSetID` and similar `$`-referencing (or `with`, `within`, `transform`, or even shifting to the tidyverse if desired) – r2evans Nov 03 '22 at 13:11
  • 2
    Hi Jill - I'd strongly recommend **against** using `attach`. There's plenty of functions that make it easy to work with data in data frames without excessive typing, but when you `attach` you risk data becoming out of sync, and can introduce confusion between the attached variables and the columns in the data frame. – Gregor Thomas Nov 03 '22 at 13:12
  • Greetings! Usually it is helpful to provide a minimally reproducible dataset for questions here so people can troubleshoot your problems (rather than a table or screenshot for example). One way of doing this is by using the `dput` function on the data you are using and pasting the output into your question. You can find out how to use it here: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Nov 16 '22 at 07:21

0 Answers0