0

I am reading one tab separated file which is matrix using following code.

infile <- file("line.txt", "r")  
readLines(infile) -> matrix

I cannot use read.table() as number of columns are not same in all rows.

Input data:

position    SNP rs11828013  rs7931369   rs567411332 rs184532784 rs7931583   rs555937772 rs9651750   rs9651751   rs9651752   rs73530502
71278426    rs11828013  rs11828013
71278461    rs7931369   -   rs7931369
71278482    rs567411332 -   -   rs567411332
71278519    rs184532784 -   -   -   rs184532784
71278580    rs7931583   -   1.000   -   -   rs7931583
71278733    rs555937772 -   -   -   -   -   rs555937772
71278792    rs9651750   -   1.000   -   -   1.000   -   rs9651750
71278828    rs9651751   -   1.000   -   -   1.000   -   1.000   rs9651751
71278915    rs9651752   -   1.000   -   -   1.000   -   1.000   1.000   rs9651752
71279052    rs73530502  -   0.116   -   -   0.116   -   0.116   0.116   0.116   rs73530502
logicstar
  • 197
  • 4
  • 16
  • Use read.delim? Hard to debug without seeing how your original data looks like. – Heroka Nov 24 '15 at 15:40
  • 1
    I don't know that anyone could provide a sensible answer to this without knowing how you want to handle the difference in the number of columns between rows. – Benjamin Nov 24 '15 at 15:43
  • Also, if you really wanted, you could write some variant of a loop that would use `read.table` with the `skip` and `nrows` arguments to read one line at a time and store each result in a list element. But again, what do you do from there? – Benjamin Nov 24 '15 at 15:44
  • I have edited and added input table. I just want to read it and process into matrix – logicstar Nov 24 '15 at 15:45
  • But - as Benjamin said - how exactly should your matrix look? Are the '-'s columns? – Heroka Nov 24 '15 at 15:50
  • My mtatrix is lower traingular matrix with header as shwon in example above. – logicstar Nov 24 '15 at 15:53
  • please see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Nobody is going to download some sketchy file. You should include sample data and desired output in the text of your question. Help us help you. – C8H10N4O2 Nov 24 '15 at 16:39
  • @C8H10N4O2 i have corrected my question and reedited. Thanks for your advice. – logicstar Nov 24 '15 at 16:44

2 Answers2

3

With:

read.table(file="line.txt", na.strings = "-", 
           header=TRUE, stringsAsFactors=FALSE, fill=TRUE)

where "line.txt" the name you gave to your tab-delimited text file.

Use fill=TRUE to allow for incomplete lines, from ?read.table:

fill logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added

na.strings a character vector of strings which are to be interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.

To use your sample input, instead of using file="line.txt", I am simply doing:

x <- 
read.table(text='
position    SNP rs11828013  rs7931369   rs567411332 rs184532784 rs7931583   rs555937772 rs9651750   rs9651751   rs9651752   rs73530502
71278426    rs11828013  rs11828013
71278461    rs7931369   -   rs7931369
71278482    rs567411332 -   -   rs567411332
71278519    rs184532784 -   -   -   rs184532784
71278580    rs7931583   -   1.000   -   -   rs7931583
71278733    rs555937772 -   -   -   -   -   rs555937772
71278792    rs9651750   -   1.000   -   -   1.000   -   rs9651750
71278828    rs9651751   -   1.000   -   -   1.000   -   1.000   rs9651751
71278915    rs9651752   -   1.000   -   -   1.000   -   1.000   1.000   rs9651752
71279052    rs73530502  -   0.116   -   -   0.116   -   0.116   0.116   0.116   rs73530502
',na.strings='-', header=TRUE, stringsAsFactors = FALSE, fill=TRUE)

To turn this back into a lower-triangular matrix, you can then do:

x[,1] <- NULL
rownames <- x[,1]
x <- sapply(x[,-1], as.numeric)
rownames(x) <- rownames
x

which returns the matrix:

            rs11828013 rs7931369 rs567411332 rs184532784 rs7931583 rs555937772 rs9651750 rs9651751 rs9651752 rs73530502
rs11828013          NA        NA          NA          NA        NA          NA        NA        NA        NA         NA
rs7931369           NA        NA          NA          NA        NA          NA        NA        NA        NA         NA
rs567411332         NA        NA          NA          NA        NA          NA        NA        NA        NA         NA
rs184532784         NA        NA          NA          NA        NA          NA        NA        NA        NA         NA
rs7931583           NA     1.000          NA          NA        NA          NA        NA        NA        NA         NA
rs555937772         NA        NA          NA          NA        NA          NA        NA        NA        NA         NA
rs9651750           NA     1.000          NA          NA     1.000          NA        NA        NA        NA         NA
rs9651751           NA     1.000          NA          NA     1.000          NA     1.000        NA        NA         NA
rs9651752           NA     1.000          NA          NA     1.000          NA     1.000     1.000        NA         NA
rs73530502          NA     0.116          NA          NA     0.116          NA     0.116     0.116     0.116         NA
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • 1
    Wow. I clearly haven't read the documentation enough. I've never noticed `fill`. – Benjamin Nov 24 '15 at 16:16
  • @logicstar what is your desired output? please edit your question! – C8H10N4O2 Nov 24 '15 at 16:23
  • @C8H10N4O2 I mean like u have shown file="postion SNP ...." but when i read the sameusing file i don;t get any output. code : read.table("sample",na.strings = "-", header=TRUE, stringsAsFactors=FALSE, fill=TRUE) Error in read.table("sample", na.strings = "-", header = TRUE, stringsAsFactors = FALSE, : more columns than column names file : http://s000.tinyupload.com/index.php?file_id=80676663964480355634 Can you pls show me the same results. – logicstar Nov 24 '15 at 16:25
  • @C8H10N4O2 I have edited my question. it is same file but this time it is in file format instead of text., – logicstar Nov 24 '15 at 16:34
  • @logicstar I have done my best to answer a very unclear question. I doubt the name of your text file is "sample". It might be "sample.txt" or something. In your question, the file was called "line.txt", so I used that file name. – C8H10N4O2 Nov 24 '15 at 16:56
  • Sorry i renamed the file but still it is same file. But i want to read from file instead of reading sample. I am not very good in R so pls help me. – logicstar Nov 24 '15 at 17:01
0

Add sep="\t" to read.table()

data <- read.table(file="line.txt", na.strings = "-", sep = "\t",
header=TRUE, stringsAsFactors=FALSE, fill=TRUE)
sinalpha
  • 383
  • 1
  • 9
  • It's not necessary in this case. check `?read.table` `If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns.` – C8H10N4O2 Dec 02 '15 at 22:17