5

I encountered this issue using the data.table fwrite() and fread() functions for managing resources in a parallel calculation, but was also able to recreate the behavior in the below sequential example code. Calling fwrite() throws the following error:

Error in fwrite(dt, csv_path) : Permission denied: 'D:/mypath/test.csv'. Failed to open existing file for writing. Do you have write permission to it? Is this Windows and does another process such as Excel have it open?

The behavior seems to be related to the calling of fread() right before, as commenting out the fread() command makes the error disappear. Depending on your system, you might have to increase the number of iterations before the error occurs as it occurs at varying iteration numbers.

Does anyone have an idea why this is happening? Thanks in advance for your assistance!

Example code:

library(data.table)

dt = data.table(a = c(1, 2), b = c("a", "b"))
csv_path = "D:/mypath/test.csv"
fwrite(dt, csv_path)

for(i in 1:10000){
  test = fread(csv_path)
  fwrite(dt, csv_path)
}

System info

R version 4.0.0 (2020-04-24)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows Server x64 (build 14393)

data.table_1.12.8

Janus De Bondt
  • 373
  • 4
  • 19
  • 9
    does it help if you put `Sys.sleep(0.001)` in the middle? maybe windows file handler is not fast enough to close the file after reading it with fread? – jangorecki May 15 '20 at 16:37
  • With a sleep of 1 ms the error will still occur, though it seems to occur less frequent. I guess increasing the sleep time will eliminate this error, but it will also increase the running time of the calculation. What surprises me is that the code will never throw an error using `read.csv()` in combination with `write.csv()`, which leads me to think it is related to the `data.table` package. – Janus De Bondt May 16 '20 at 09:13
  • is it reproducible on linux? – jangorecki May 16 '20 at 10:03
  • Unfortunately I do not have a Linux system to test on. – Janus De Bondt May 16 '20 at 12:24
  • Works fine for me on OSX. PS the problem as posted is not fully reproducible -- the first `fread(csv_path)` assumes such file exists already. If we switch the order, does the bug still happen? – MichaelChirico May 17 '20 at 16:01
  • @MichaelChirico thanks for highlighting that it was not fully reproducible. I have made an edit to the question. I still get the error after switching the order of `fwrite()` and `fread()`. Dit you try increasing the number of iterations (for me sometimes I need 20,000 before I get the error)? – Janus De Bondt May 18 '20 at 05:34
  • 3
    I just let it run for `398378` iterations again on Mac. Must be a windows thing – MichaelChirico May 18 '20 at 10:27
  • 1
    Runs fine on linux (BTRFS). Almost certainly a Windows file system issue (file not closing fast enough), and nothing to do with data.table. – dww May 19 '20 at 02:14
  • @dww, thanks for testing this on Linux. However, how can we explain the fact that there is no error when using `read.csv()` and `write.csv()`? It seems there is at least a `data.table` related point in the root cause of the problem I would say. – Janus De Bondt May 19 '20 at 07:10
  • 3
    `write.csv` does a lot of stuff before actually opening a file connection. That stuff takes way more time than those microseconds you seem to care about. – Roland May 19 '20 at 07:44
  • Thanks for your comment, Roland! Could you please elaborate a bit on what exactly `write.csv()` does before opening the file connection? As my use case is writing a very small amount of data, I think the solution will be to use `write.csv()` for now. Do you think it could be useful to report a bug on GitHub? – Janus De Bondt May 20 '20 at 08:41
  • Are you able to check how much `sleep` is enough to make problem disappear? – jangorecki May 22 '20 at 11:05

2 Answers2

4

I tried your code on a Windows machine and I was not able reproduce it.

I believe the issue is related to Windows file handler, which seems to be not fast enought to close file connection before opening it again.

You can try following code to see if it is reproducible just in R:

x = "a,b\n1,a\n2,b\n"

csv_path = "D:/mypath/test.csv"
file.create(csv_path)
f = file(csv_path, "w")
cat(x, file=f)
close(f)

for (i in 1:10000) {
  f = file(csv_path, "r")
  test = readLines(f)
  close(f)
  f = file(csv_path, "w")
  cat(x, file=f)
  close(f)
}

What could also make sense is it see how much Sys.sleep is enough to make the problem disappear.

jangorecki
  • 16,384
  • 4
  • 79
  • 160
0

Determine the number of threads you're using for data tables with

data.table::getDTthreads()

I was receiving the same fread() error until I reduced this from 96 to 24 with

data.table::setDTthreads(threads = 24)

Other users have reported threads < 79 works. See .data.table crashes with segfault while grouping with more than 79 threads #5077.