0

I'm using the CATALYST package to analyze CYTOF data.

One of the commands requires a table in the format of a binary matrix.

I made the table in a .csv document and read it into R, which returned a dataframe-

#DEBARCODING
#read in the debarcoding sample key
key <- read.csv('10.9.20.5choose1.barcode_key_tube1.csv', check.names = FALSE)
key

The output was a data.frame that looked like

             89 104 106 108 110
1   B6_CD451  1   0   0   0   0
2 Balb_CD451  0   1   0   0   0
3   STING_B6  0   0   1   0   0
4 STING_Balb  0   0   0   1   0
5      A960F  0   0   0   0   1

How do I convert this into a binary matrix/table that looks exactly the same.

I tried as.matrix

#The debarcoding scheme should be a binary table with sample IDs as row and numeric barcode masses as column names:
as.matrix(key)

and my output was

> as.matrix(key)
                  89  104 106 108 110
[1,] "B6_CD451"   "1" "0" "0" "0" "0"
[2,] "Balb_CD451" "0" "1" "0" "0" "0"
[3,] "STING_B6"   "0" "0" "1" "0" "0"
[4,] "STING_Balb" "0" "0" "0" "1" "0"
[5,] "A960F"      "0" "0" "0" "0" "1"

every row was covered in quotation marks.

#DEBARCODING
> #read in the debarcoding sample key
> key <- read.csv('10.9.20.5choose1.barcode_key_tube1.csv', check.names = FALSE)
> key
             89 104 106 108 110
1   B6_CD451  1   0   0   0   0
2 Balb_CD451  0   1   0   0   0
3   STING_B6  0   0   1   0   0
4 STING_Balb  0   0   0   1   0
5      A960F  0   0   0   0   1
> dput(key)
structure(list(c("B6_CD451", "Balb_CD451", "STING_B6", "STING_Balb", 
"A960F"), `89` = c(1L, 0L, 0L, 0L, 0L), `104` = c(0L, 1L, 0L, 
0L, 0L), `106` = c(0L, 0L, 1L, 0L, 0L), `108` = c(0L, 0L, 0L, 
1L, 0L), `110` = c(0L, 0L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA, 
-5L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

2 Answers2

1

Starting with a frame:

dat <- structure(list(`89` = c(1L, 0L, 0L, 0L, 0L), `104` = c(0L, 1L, 0L, 0L, 0L), `106` = c(0L, 0L, 1L, 0L, 0L), `108` = c(0L, 0L, 0L, 1L, 0L), `110` = c(0L, 0L, 0L, 0L, 1L)), class = "data.frame", row.names = c("B6_CD451", "Balb_CD451", "STING_B6", "STING_Balb", "A960F"))
dat
#            89 104 106 108 110
# B6_CD451    1   0   0   0   0
# Balb_CD451  0   1   0   0   0
# STING_B6    0   0   1   0   0
# STING_Balb  0   0   0   1   0
# A960F       0   0   0   0   1

Converting to a matrix of integers:

m_int <- as.matrix(dat)
m_int
#            89 104 106 108 110
# B6_CD451    1   0   0   0   0
# Balb_CD451  0   1   0   0   0
# STING_B6    0   0   1   0   0
# STING_Balb  0   0   0   1   0
# A960F       0   0   0   0   1
str(m_int)
#  int [1:5, 1:5] 1 0 0 0 0 0 1 0 0 0 ...
#  - attr(*, "dimnames")=List of 2
#   ..$ : chr [1:5] "B6_CD451" "Balb_CD451" "STING_B6" "STING_Balb" ...
#   ..$ : chr [1:5] "89" "104" "106" "108" ...

Converting to a matrix of logical:

m_lgl <- as.matrix(dat) > 0
m_lgl
#               89   104   106   108   110
# B6_CD451    TRUE FALSE FALSE FALSE FALSE
# Balb_CD451 FALSE  TRUE FALSE FALSE FALSE
# STING_B6   FALSE FALSE  TRUE FALSE FALSE
# STING_Balb FALSE FALSE FALSE  TRUE FALSE
# A960F      FALSE FALSE FALSE FALSE  TRUE

It's not easy to infer what data you actually have, as the default representation of a data.frame can be ambiguous. However, if you instead of all strings there, then your as.matrix might look like:

as.matrix(dat)
#            89  104 106 108 110
# B6_CD451   "1" "0" "0" "0" "0"
# Balb_CD451 "0" "1" "0" "0" "0"
# STING_B6   "0" "0" "1" "0" "0"
# STING_Balb "0" "0" "0" "1" "0"
# A960F      "0" "0" "0" "0" "1"

str(dat)
# 'data.frame': 5 obs. of  5 variables:
#  $ 89 : chr  "1" "0" "0" "0" ...
#  $ 104: chr  "0" "1" "0" "0" ...
#  $ 106: chr  "0" "0" "1" "0" ...
#  $ 108: chr  "0" "0" "0" "1" ...
#  $ 110: chr  "0" "0" "0" "0" ...

In that case, starting with the original dat, you can simply do

as.matrix(dat) != "0"
#               89   104   106   108   110
# B6_CD451    TRUE FALSE FALSE FALSE FALSE
# Balb_CD451 FALSE  TRUE FALSE FALSE FALSE
# STING_B6   FALSE FALSE  TRUE FALSE FALSE
# STING_Balb FALSE FALSE FALSE  TRUE FALSE
# A960F      FALSE FALSE FALSE FALSE  TRUE
+(as.matrix(dat) != "0")
#            89 104 106 108 110
# B6_CD451    1   0   0   0   0
# Balb_CD451  0   1   0   0   0
# STING_B6    0   0   1   0   0
# STING_Balb  0   0   0   1   0
# A960F       0   0   0   0   1

for whichever class you really need.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • the as.matrix function is adding a bunch of quotation marks in front of every charector. – Akshay Iyer Jan 11 '21 at 02:14
  • Please provide an unambiguous version of your data by editing your question and adding the output from `dput(x)`. – r2evans Jan 11 '21 at 02:15
  • See if my edited answer addresses your issue. If not, then please listen to my request for real data using `dput`. – r2evans Jan 11 '21 at 02:22
  • It did not fix it. I've used the dput command to show the real data to the bottom of my question. Thank you. – Akshay Iyer Jan 11 '21 at 02:30
  • Yes, *that* is why unambiguous sample data is required to help in many situations. Try `rn <- dat[,1]; m <- as.matrix(dat[,-1]); rownames(m) <- rn;` – r2evans Jan 11 '21 at 02:42
1

The first column has a blank column name. You can add that as a rowname and delete the 1st column.

#Can ignore this step if you don't need it.
rownames(key) <- key[[1]]
key[[1]] <- NULL
mat <- as.matrix(key)
mat
#           89 104 106 108 110
#B6_CD451    1   0   0   0   0
#Balb_CD451  0   1   0   0   0
#STING_B6    0   0   1   0   0
#STING_Balb  0   0   0   1   0
#A960F       0   0   0   0   1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213