0

I have a matrix extracted with R which has many NA values ( values not found in the text for extraction)

And i have a reference csv file that contain all combination possibilities of the same data

this is a part of my matrix

data matrix;

       OMIM   GENES_SYMBOL         GENES        CHROMOSOME
1      (NA)       (arlts1)    (perforin)              (NA)
2      (NA)          (mtr)          (NA)              (NA)
3    (325410)         (NA)          (NA)              (NA)
4      (NA)         (t341c)          (NA)              (5)

this is how the csv matrix looks like

dictionary matrix;

  OMIM      GENES_SYMBOL     GENES                               CHROMOSOME
 "612367"   "alpqtl2"  anorexia nervosa,a 1"                        1
 "606788"   "arlts1"    basal cell carcinoma, susceptibility to,      3
 "325410"   "bcc1"     bone mineral density qtl 3                    10

I want to map the first matrix with the second one to fill all equivalent values and get rid of NA. the problem is the matrices have not the same length( the second >>>>> the first) and rows in both are not organised the same; the 1st row of the data matrix can be the row number 500 in the dictionary matrix

I wrote this code but it worked only when 2 matrices have same length. If not it returns only 2 columns from the data matrix

genemap<- data.table::fread("GeneMap - Copy.csv",sep="\t")

fun <- function(rowi,genemap) {
  res <- apply(as.data.frame(genemap),1,function(x) {length(na.omit(match(na.omit(rowi),x)))})
  IND <- which(  max(datamatrix) == datamatrixs  )[1]

  rowi[is.na(rowi)] <- unlist(genemap[IND,])[is.na(rowi)]
  return(rowi)
}

as.data.frame(t(apply(datamatrix, 1, fun, genemap))
)
        OMIM   GENES_SYMBOL       
1      (NA)       (arlts1)   
2      (NA)          (mtr)         
3    (325410)         (NA)         
4      (NA)        ( t341c)    

any suggestion to modify the code??

  • Are you sure these are matrices? They look like dataframes to me. R only allows one data type in matrices (i.e., all numeric data), while dataframes can have multiple data types (numeric, character, etc. like your data) – Jan Boyer Aug 27 '18 at 18:44
  • I extrated numbers as text character.. If the solution is applicable with dataframe I can convert them there is no problem with data type I need the content @JanBoyer – aida abidi Aug 27 '18 at 19:13
  • You can identify the class of your objects using `class()`. Regarding your code, there are a number of things that are not clear, like what is object `rowi`, also you use `datamatrix` and `datamatrixs` within your function `fun` without defining them. You should have a look at [questions](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right#1300618) on how to join two dataframes. You could either join on each column (omim, genename, genesymbol) seperately and then combine the results, or construct a more complex sql query using the `sqldf` package. – Lamia Aug 27 '18 at 22:45

0 Answers0