2

I have a one-column dataset with blank lines. Each blank line defines a new data block, i.e. what I would plot with gnuplot (if blank line were doubled) like this:

plot "datafile" i n

where n is the n-th block.

How could I implement a data import in R so that I could create e.g. a two-index matrix, where the first index is the row index, and the second the block index? (plus, number of rows is always the same)

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • Is all of your data numeric? – Max Candocia May 06 '15 at 15:00
  • can you give a reproducible example? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Henk May 06 '15 at 15:07
  • yes, it's numeric. they are 300 numbers, let's say, made of three columns of 100 numbers each, stacked one on each other and separated by return (so the file has 302 rows). I don't know how to attach the file –  May 06 '15 at 15:18
  • Each block has the same number of numbers, always? – zx8754 May 06 '15 at 15:24

2 Answers2

3

Try this example:

#dummy data
m <- matrix(c(runif(3),NA,runif(2),NA,runif(2),NA,runif(3)),ncol=1)
m
#            [,1]
# [1,] 0.66061405
# [2,] 0.52066742
# [3,] 0.65503125
# [4,]         NA
# [5,] 0.80940612
# [6,] 0.04561362
# [7,]         NA
# [8,] 0.56771139
# [9,] 0.12002132
# [10,]         NA
# [11,] 0.32809536
# [12,] 0.45677662
# [13,] 0.97538827
#index of intervals
ix <- c(0,which(is.na(m[,1])),nrow(m))

#assign blocks
m <- cbind(m,rep(1:length(diff(ix)),diff(ix)))

#exclude blank rows
m[ !is.na(m[,1]), ]
#            [,1] [,2]
# [1,] 0.54458424    1
# [2,] 0.99712258    1
# [3,] 0.21064432    1
# [4,] 0.38194407    2
# [5,] 0.78414814    2
# [6,] 0.95007031    3
# [7,] 0.09169785    3
# [8,] 0.03803962    4
# [9,] 0.78180826    4
# [10,] 0.40222317    4
Frank
  • 66,179
  • 8
  • 96
  • 180
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • 1
    thanks! it's very close to what I had in my mind. Now my goal is to plot the histogram of n-th block, but I guess I will make it now... –  May 06 '15 at 15:41
  • 1
    Same idea: `cbind(m,1L+cumsum(is.na(m)))[!is.na(m),]` – Frank May 06 '15 at 15:48
1

Using @zx8754's example data...

set.seed(1)
m <- matrix(c(runif(3),NA,runif(2),NA,runif(2),NA,runif(3)),ncol=1)

we can make the second column of his result table with cumsum:

cbind(m,1L+cumsum(is.na(m)))[!is.na(m),]

which gives

            [,1] [,2]
 [1,] 0.26550866    1
 [2,] 0.37212390    1
 [3,] 0.57285336    1
 [4,] 0.90820779    2
 [5,] 0.20168193    2
 [6,] 0.89838968    3
 [7,] 0.94467527    3
 [8,] 0.66079779    4
 [9,] 0.62911404    4
[10,] 0.06178627    4
Frank
  • 66,179
  • 8
  • 96
  • 180
  • 1
    @zx8754 I'm guessing it didn't work because you already had the overwritten `m` (not your original `m`). – Frank May 06 '15 at 16:01