0

I am trying to read a random number from a big data set using this script:

library(sqldf)
 Mydata <- read.csv.sql("mydata.csv", sql =  "select * from file order by random() limit 50000")

but got this warning message and my output is empty.

 closing unused connection 3 (mydata.csv) 

Does anyone have any idea about what I am missing?

MFR
  • 2,049
  • 3
  • 29
  • 53
  • What exactly do you do with `Mydata` afterward? That error just means you are constantly opening new connections and not closing them. That doesn't necessarily have any impact on what's being returned. It would be better if you made a more [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can see what you are really doing. – MrFlick Aug 31 '16 at 01:22
  • Thanks @MrFlick. The data is extremely big and I can not load it in R. I decided to select a random number of data and then do my analysis on this sample. I used this script but gives me a dataset with zero observation. – MFR Aug 31 '16 at 01:26
  • @Eddie, I have tried your code above on a random csv file and it works as expected. It produced the desired output. Perhaps try the same code on a different csv and let us know if your output is still empty. – jav Aug 31 '16 at 02:18
  • Thanks @jav. Also, I tried different dataste and it worked for them. I couldn't find out what is different about this particular dataset ( apart from that, this one is much bigger) – MFR Aug 31 '16 at 04:25
  • @Eddie, I may have an idea what's wrong. A few weeks ago, I had a problem with data not being imported into R using sqldf, and the reason was the line endings being Mac line endings. Please try `Mydata <- read.csv.sql("mydata.csv", sql = "select * from file order by random() limit 100", eol = "\r")`. – jav Sep 02 '16 at 01:15

0 Answers0