I have got quite a good experience with C programming and I am used to think in terms of pointers, so I can get good performance when dealing with huge amount of datas. It is not the same with R, which I am still learning.
I have got a file with approximately 1 million lines, separated by a '\n' and each line has got 1, 2 or more integers inside, separated by a ' '. I have been able to put together a code which reads the file and put everything into a list of lists. Some lines can be empty. I would then like to put the first number of each line, if it exists, into a separated list, just passing over if a line is empty, and the remaining numbers into a second list.
The code I post here is terribly slow (it has been still running since I started wrote this question so now I killed R), how can I get a decent speed? In C this would be done instantly.
graph <- function() {
x <- scan("result", what="", sep="\n")
y <- strsplit(x, "[[:space:]]+") #use spaces for split number in each line
y <- lapply(y, FUN = as.integer) #convert from a list of lists of characters to a list of lists of integers
print("here we go")
first <- c()
others <- c()
for(i in 1:length(y)) {
if(length(y[i]) >= 1) {
first[i] <- y[i][1]
}
k <- 2;
for(j in 2:length(y[i])) {
others[k] <- y[i][k]
k <- k + 1
}
}
In a previous version of the code, in which each line had at least one number and in which I was interested only in the first number of each line, I used this code (I read everywhere that I should avoid using for loops in languages like R)
yy <- rapply(y, function(x) head(x,1))
which takes about 5 second, so far far better than above but still annoying if compared to C.
EDIT this is an example of the first 10 lines of my file:
42 7 31 3
23 1 34 5
1
-23 -34 2 2
42 7 31 3 31 4
1