4

I am asked to write R output in two binary files, an index file and a main data file. There will be one matrix/block corresponding to each id in the index file. I have read about writing binary files in R on the internet but I am not sure how to specify the format so that I can achieve this format?

Also, can we specify short integer in R? He said he wants the numebers to be short intergets (two bytes) and I don't want what that means.

I appreciate any input! Thanks

vanilli
  • 131
  • 2
  • 4
  • 1
    A quick search using `[r] binary file` on StackOverflow reveals the following very similar question: http://stackoverflow.com/q/1635278/602276 – Andrie Aug 08 '11 at 23:19
  • 1
    As @mdsummer writes, you can specify how to write integers of size 2, but you problem statement is quite vague. Is the matrix data integers or are the ids integers? Or perhaps the ids are strings? – Tommy Aug 09 '11 at 00:52
  • Welcome to StackOverflow! If one of the answer here are what you need, you should mark it as an answer. Otherwise update your question to clarify what you need. You should also upvote answers (and questions) you like. Just click on the score in the upper left! – Tommy Aug 09 '11 at 02:43

2 Answers2

4

Since you didn't specify the problem very clearly, I made some assumptions in the sample code below. Given a list of matrices, it saves them to a .bin file and creates an .idx file with offsets. You can then load them back in again given an index. The 2-byte size you mentioned isn't used - it saves the matrix data as 8-byte doubles or 4-byte integers (but you could change that).

Here's how it's used:

mtx <- list(matrix(1:12,4), matrix(sin(1:12),4))
saveMatrixList("c:/foo", mtx)

loadMatrix("c:/foo", 1)
loadMatrix("c:/foo", 2)

...and here are the functions:

saveMatrixList <- function(baseName, mtxList) {
    idxName <- paste(baseName, ".idx", sep="")
    idxCon <- file(idxName, 'wb')
    on.exit(close(idxCon))

    dataName <- paste(baseName, ".bin", sep="")
    con <- file(dataName, 'wb')
    on.exit(close(con))

    writeBin(0L, idxCon)

    for (m in mtxList) {
        writeBin(dim(m), con)
        writeBin(typeof(m), con)
        writeBin(c(m), con) 
        flush(con)

        offset <- as.integer(seek(con))
        cat('offset', offset)
        writeBin(offset, idxCon)
    }

    flush(idxCon)
}

loadMatrix <- function(baseName = "data", index) {
    idxName <- paste(baseName, ".idx", sep="")
    idxCon <- file(idxName, 'rb')
    on.exit(close(idxCon))

    dataName <- paste(baseName, ".bin", sep="")
    con <- file(dataName, 'rb')
    on.exit(close(con))

    seek(idxCon, (index-1)*4)
    offset <- readBin(idxCon, 'integer')

    seek(con, offset)
    d <- readBin(con, 'integer', 2)
    type <- readBin(con, 'character', 1)
    structure(readBin(con, type, prod(d)), dim=d)
}
Tommy
  • 39,997
  • 12
  • 90
  • 85
2

See help(writeBin), size = 2 defines the allocation to each element (i.e. a two byte integer). But if you don't know what this means you probably will need a lot more information from your requester.

mdsumner
  • 29,099
  • 6
  • 83
  • 91