0

I am running the below command where there are like 5572457 rows. The tempField$Lat column has data separated by comma and I want this data in different columns in a data frame. The logic is running fine for 50000 rows which is the sample load i am using to test. But when i use with the actual data set it doesn't run.

tempLat <- read.table(text = as.character(tempField$Lat),
              sep = ",", header = FALSE, fill = TRUE)
Metrics
  • 15,172
  • 7
  • 54
  • 83
Ankit Solanki
  • 670
  • 2
  • 9
  • 23
  • Are you getting an error? Or is it just taking a long time? – thelatemail Aug 12 '13 at 03:46
  • I am not getting any error... I ran the job for about 24 hours... I see no change in the memory and the process was running but no results. – Ankit Solanki Aug 12 '13 at 03:50
  • Are there are any restrictions on the use of readtable that you are aware of?? – Ankit Solanki Aug 12 '13 at 03:51
  • As far as I can tell, there's nothing wrong with your `read.table()` code. You might want to try `read.table(text=as.character(tempField$lat),sep=",",header=FALSE,fill=TRUE,quote="")` , courtesy of [this R-help post](https://stat.ethz.ch/pipermail/r-help/2007-January/123755.html) – thelatemail Aug 12 '13 at 03:53
  • I tried this but I still dont see any change in the memory. The job is running from past 5 mins. I expect the job to get complete by now as its just a simple pull and split thats happening here. – Ankit Solanki Aug 12 '13 at 04:05
  • It really is difficult to know what is going wrong - this could just be an issue of size - you might want to review the answers here: http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r - in particular the `fread` function from the `data.table` package. – thelatemail Aug 12 '13 at 04:11
  • Have you tried a manual approach of using `strsplit` and then something like `do.call(rbind.data.frame, ...)`? If the data are unbalanced after you use `strsplit`, you might need to use something like `rbind.fill` from "plyr". You might also want to try adding a `blank.lines.skip = FALSE` into there in case any row in your `tempField` dataset is empty. – A5C1D2H2I1M1N2O1R2T1 Aug 12 '13 at 06:41
  • 1
    You can also try breaking up the process and reading in a certain number of lines at a time to see if you can identify where the problem emerges. – A5C1D2H2I1M1N2O1R2T1 Aug 12 '13 at 06:42
  • I tried using strsplit but didnt work. Since the task was urgent I used Python script to process the data and imported the processed data in R! Will be searching for a better solution. I personally feel that the job shouldnt fail because of the volume as this data was a sample of what I am expecting to get in a month or so... – Ankit Solanki Aug 12 '13 at 09:41
  • What happens if you use a different tool such as `scan` ? – Carl Witthoft Aug 12 '13 at 11:16

0 Answers0