1

I am new to R and have been trying to implement a code setup to analyses gene expression and genetic mutation status to predict outcomes in breast cancer patients.

the original code was published in Nature for Acute myeloid Leukemia data sets and can be downloaded from: http://www.nature.com/ncomms/2015/150109/ncomms6901/full/ncomms6901.html

following supplemental data 4 code

I am unable to replicate their data, as there is a code error in the data.frame

I am able to load all of my data from cBioportal using the following code:

mycgds <-  CGDS("http://www.cbioportal.org/public-portal/")
brca_tcga <- getCancerStudies(mycgds)[15,1] ## 15 for BRCA
cases <- getCaseLists(mycgds,brca_tcga)[8,1]  ## 8 for RNA expression z scores
g <-  lapply(split(as.numeric(entrez), seq_along(entrez)%/%500), function(genes) getProfileData(mycgds,genes,getGeneticProfiles(mycgds,brca_tcga)[2,1],cases)) ## loads my sample information into a data.frame "g"

then I try to impliment following code:

g <- do.call("cbind", g)

which yields an error-

> g <- do.call("cbind", g)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 173, 0

I have tried to follow threads but some of them are above my head, I am not sure if something went wrong in constructing the data.frame or where to begin to fix this issue. Any assistance would be appreciated or pointing me to a good document explaining whats going on.

I can print my data by calling g:

     WDR38 WDR63 WDR86 ZBED9 ZCWPW2 ZNF283 ZNF300P1 ZNF418 ZNF600
TCGA.AB.2803.03    NA    NA    NA    NA     NA     NA       NA     NA     NA
TCGA.AB.2805.03    NA    NA    NA    NA     NA     NA       NA     NA     NA
TCGA.AB.2806.03    NA    NA    NA    NA     NA     NA       NA     NA     NA
TCGA.AB.2807.03    NA    NA    NA    NA     NA     NA       NA     NA     NA
TCGA.AB.2808.03    NA    NA    NA    NA     NA     NA       NA     NA     NA

small example, but am unable to go through the next step of code.

:-(

Thank you all for any assistance or education you may provide!

Coug Rae
  • 11
  • 3
  • 1
    what is your vector `entrez` referenced in your code? – Sam Firke Mar 26 '15 at 01:05
  • Hi Sam, the vector entrez is used to switch the names or genes identified from a micro array-with their entrez gene id so they can readily be identified and linked to gene mutation data later in the code: here is the entrez code entrez <- unique(AnnotationDbi::select(hgu133plus2.db, keys = keys(hgu133plus2.db), columns = c("ENTREZID"))$ENTREZID) – Coug Rae Mar 26 '15 at 01:08
  • Have you checked if the loaded data is correct? Try running `head()` after each loading and subsetting to make sure all the data is correct. – Molx Mar 26 '15 at 01:14
  • Hi @Molx I ran head(g) after making the data.frame and got the following: TCGA.C8.A1HM.01 0.6361 1.0200 -0.6011 0.7024 0.8667 0.7231 NA TCGA.C8.A1HN.01 0.3320 0.0669 -0.3567 0.5700 3.3222 0.2932 NA TCGA.E2.A14N.01 1.5758 -1.2924 0.3895 2.6431 -0.9575 -0.8534 NA this data appears to be correct. – Coug Rae Mar 26 '15 at 01:26
  • sorry @SamFirke I didn't see how to tag people for replies till just a moment ago :-( – Coug Rae Mar 26 '15 at 01:27
  • 1
    @CougRae it's not easy to read data inline, but it seems that it is different from what you posted in the question, since it has numbers instead of all NAs. – Molx Mar 26 '15 at 01:34
  • I was able to resolve the issue using rbind.fill from the Hadley "plyr" package. The link is here http://stackoverflow.com/questions/7962267/cbind-a-df-with-an-empty-df-cbind-fill – Coug Rae Mar 26 '15 at 17:09

0 Answers0