How to read a csv only for the first time running the script?

Question

My code:

if (data == null) {
   data <- read.csv(file)
}

Reading from a big data file, which is why it would be nice to read it only once.

You need `is.null` and if it is a big data, try `fread` i.e. `library(data.table); fread(file)` — akrun, Feb 21 '18 at 03:49
Thank you! I tried using fread, but am running into an error. — R.Kim, Feb 21 '18 at 03:52
Error in fread("temp_prec_small.csv.bz2", stringsAsFactors = FALSE) : embedded nul in string: '\xe7w\xd73\xb0(\x96\x8f\x92'Ȋj\xeb\xd1\xd0&\x82\017\xcdx\xb8\xa3\x9aE\v\x86\xf0\xab\036\xadE\0320\xca#=\xceN\xfb\xca2Oc\xabĒj\x9c!\xc4A\0©"\017&\xc55\xe4\xd3 \xd2\035\0210\xa2\n\t\002\021K]\xc8\031,\026)\001<}\xa7\xd0TX\xac\xb1@\x91I\023\xa9\031t\x85\xaa\024B4' — R.Kim, Feb 21 '18 at 03:53
You said it is a csv file, but the file ending in your error message is. `bz2` — akrun, Feb 21 '18 at 03:54
figure out what you need from the data file and save that in a smaller file that is faster to read. Or if you need the entire file, use `saveRDS`/`readRDS` to save it in a compressed binary format that should be smaller than raw text. — avigil, Feb 21 '18 at 03:54
You can check [here](https://stackoverflow.com/questions/25948777/extract-bz2-file-in-r) — akrun, Feb 21 '18 at 03:55
Yes. Is there a way to use fread with a bz2? Or, a different fast reading function for bz2? — R.Kim, Feb 21 '18 at 03:55
I don't think `is.null` is the correct approach here. I think `exists` is more appropriate. — A5C1D2H2I1M1N2O1R2T1, Feb 21 '18 at 04:25

score 1 · Answer 1 · answered Feb 21 '18 at 04:24

You can use exists(), but you should also make sure that you are using a name for your dataset that might not be found by R. data() is a function in R, so it's not really a good name to assign to the data you're loading.

Nevertheless, here's how you might approach it:

ls() ## I'm starting with nothing in my workspace
# character(0)

## Here's how you can check if something exists
if (!(exists("data") & is.data.frame(data))) {
  ## Replace print("boo") with whatever you actually want to do--
  ## Read data, load data, whatever
  print("boo")
} else {
  ## If it does exist, you don't really need to do anything
  ## Except proceed with your script
  head(data)
}
# [1] "boo"

Here's what happens if we have a data.frame in our environment (like you've read it in already).

data <- data.frame(V1 = 1:10, V2 = 11:20)

ls()
# [1] "data"

if (!(exists("data") & is.data.frame(data))) {
  print("boo")
} else {
  head(data)
}
#   V1 V2
# 1  1 11
# 2  2 12
# 3  3 13
# 4  4 14
# 5  5 15
# 6  6 16

As others have mentioned, though, you can also look at saving the data in a an "rdata" or "rds" format that can be loaded quickly.

score 0 · Answer 2 · answered Feb 21 '18 at 04:29

You can either check if the data has already some attributes (for example a class) or if it exists. The first solution is easy but technically not correct; the second solution can sometimes be tricky depending on your environment and variable names

## Creating data
test1 <- c(1:5, "6,7", "8,9,10")
file <- tempfile()
writeLines(test1, file)

Solution 1

if (!exists("data")) {
   data <-read.csv(file)
}

Solution 2

## Check an attribute (e.g. the class)
check_class <- try(class(data), silent = TRUE)

## Check if the data existed (i.e. had an attribute or not)
if (class(check_class) == "try-error") {
   data1 <-read.csv(file)
}

For some reason, solution 1 doesn't work. It doesn't seem to read the csv file — R.Kim, Feb 21 '18 at 19:45
Nvm, I think it was b/c I was checking if "data" exists. Thanks! — R.Kim, Feb 21 '18 at 19:47

How to read a csv only for the first time running the script?

2 Answers2

Solution 1

Solution 2