I have a file of 4.5MB (9,223,136 lines) with the following information:
0 0
0.0147938 3.67598e-07
0.0226194 7.35196e-07
0.0283794 1.10279e-06
0.033576 1.47039e-06
0.0383903 1.83799e-06
0.0424806 2.20559e-06
0.0465545 2.57319e-06
0.0499759 2.94079e-06
In each column a value is represented a value from 0 to 100 meaning a percentage. My goal is to draw a graphic in ggplot2 to see check the percentages between them (e.g. with 20% of column1 what is the percentage achieved on column2). Heres is my R script:
library(ggplot2)
dataset=read.table("~/R/datasets/cumul.txt.gz")
p <- ggplot(dataset,aes(V2,V1))
p <- p + geom_line()
p <- p + scale_x_continuous(formatter="percent") + scale_y_continuous(formatter="percent")
p <- p + theme_bw()
ggsave("~/R/grafs/cumul.png")
I'm having a problem because every time i run this R runs out of memory, giving the error: "Cannot allocate vector of size 128.0 Mb ". I'm running 32-bit R on a Linux machine and i have about 4gb free memory.
I thought on a workaround that consists of reducing the precision of these values (by rounding them) and eliminate duplicate lines so that i have less lines on the dataset. Could you give me some advice on how to do this?