0

I have a CSV file of appointments that's 2284x11. I created a script that goes down a list of users, creates a data frame of all the appointments that user had, and then sends it to a .Rmd file to be exported as a PDF.

When I try to do this all at once, I receive the following error:

Error in knitr::knit_meta_add(old_knit_meta, attr(old_knit_meta, "knit_meta_id")) : long vectors not supported yet: ../../../../R-3.3.2/src/main/memory.c:1668

If I run my script for only a few of the users at a time until I cover the entire list, I do not receive the error. This prompts me to think there's nothing wrong with the individual tables I create for each user (The largest is ~70x7), but that somehow my script is caching all the results in an inefficient way. I followed the advice from:

"long vectors not supported yet" error in Rmd but not in R Script

But turning off those cache setting hasn't helped.

Does anyone know common reasons for accidentally creating long vectors? Once again the original csv file is only 2248x11, less than 250kb, and most of the transformations I apply are just cleaning the data, subsetting, and some aggregating.

Is there a way to view the kind of data my script is storing in the background that might be causing this error?

Edit: Here is what I think the relevant code would be. I already have a data frame of sessions (including payment towards the host), a data frame of hosts and some personal information, and a data frame of tax values based on location (which affect host payments).

The following R code filters for sessions under a host's name and adds up the values in the Payment column. It sends the resulting table to a .Rmd file to be exported as a pdf.

for (i in 1:nrow(hosts)) {
  #our hosts are paid on one of two monthly cycles
  if (hosts[i,2] == cycle) { 
    #identify sessions under a host's name and that have payment associated
    hostmatch = which(sessions$Host == hosts$Name[i] & sessions$Payment != "") 
    #only continue if host has sessions under their name with associated payments
    if (length(hostmatch) > 0) { 
      hostsessions = sessions[hostmatch,] #filtering for matched sessions
      #just renaming the columns to look better for exported PDF
      colnames(hostsessions) <- c("Host","User","Appointment Date", "Appointment Time", "Appointment Type", "Appointment Status", "Payment to Host") 
      if (all(hostsessions[,7] == "$0.00")) {
        #if all the host's recorded payments were for $0.00, skip to checking the next host
        next 
      }
      else {
        #Payment is recorded as string starting with '$'. This adds up those values for a given host 
        Addup = sum(as.numeric((sub("$","", hostsessions$Payment, fixed = TRUE))))
        if (hosts$PayTax[i] == "no") {
          payout = paste("$", format(round(Addup, 2), nsmall = 2), sep="")
          #This appends a final row that shows the host's total payout. Columns 1:5 are NAs and appear as blank cells in the PDF
          hostsessions[nrow(hostsessions)+1, c(6,7)] <- c('Total:', payout)
        }
        else {
          #modified version of previous loop that accounts for tax, which is listed by Province in a separate table
          #Creates three rows at the end that consist largely of NAs
          hosttax <- merge(hosts[i,], tax, by = 'Province') #Only one row, so subsetting with $ returns a single vector
          hostsessions[nrow(hostsessions)+1, c(6,7)] <- c('Subtotal:', paste("$", format(round(Addup, 2), nsmall = 2), sep="")) 
          #Convoluted way to get an output in the format: 'HST (13%): $10.90'
          hostsessions[nrow(hostsessions)+1, c(6,7)] <- c(paste(unlist(hosttax$TaxType), ' (', as.character((hosttax$TaxAmount - 1) * 100), '%)', sep = ''), paste("$", format(round(Addup*(hosttax$TaxAmount - 1), 2), nsmall = 2), sep=""))   
          payout = paste("$", format(round(Addup*hosttax$TaxAmount, 2), nsmall = 2), sep="")
          hostsessions[nrow(hostsessions)+1, c(6,7)] <- c("Total:", payout) 
        }
        #Send to Payments.Rmd to create pdf
        rmarkdown::render(input = "Payments.Rmd", 
                          output_format = "pdf_document",
                          output_file = paste(host$Name[i]," Statement ",  date, ".pdf", sep=''),
                          output_dir = "~/")
      }
    }
  }
}

The .Rmd file is as follows. There are a number of \usepackage statements that I inserted to overcome other errors I was facing, but I don't fully understand why they were needed.

```{r, include = FALSE}
payment <- paste(hosts$Name[i], " Statement: ", date)
```

---
  title: "`r payment`"
output: pdf_document
classoption: landscape
header-includes:
  - \usepackage{float}
  - \usepackage[table]{xcolor}
  - \usepackage{graphicx}
  - \usepackage{booktabs}
  - \usepackage{longtable} 
---

```{r, echo = FALSE}
library(knitr)
library(markdown)
library(rmarkdown)
#I experimented with different ways of setting cache to false. None of them seemed to work
options(cache = FALSE, warning = FALSE, 
        message = FALSE, cache.lazy = FALSE, knitr.kable.NA = '')
#formats the table previously created that showed all the appointments a host had and the payment associated
kable(hostsessions, format = "latex", booktabs = T, longtable = T) %>%
  kable_styling(latex_options = c("striped", "repeat_header"), font_size = 9) %>%
  row_spec(nrow(counsellorsessions), bold = T)
```
Nicholas Hassan
  • 949
  • 2
  • 10
  • 27
  • 4
    Is there some way you can provide example code and data demonstrating the issue? Your data doesn't seem large enough to trigger the error, but it's hard to know without any access to the data. – Marius May 16 '18 at 23:38
  • 2
    You might be able to debug this by setting `options(error = recover)`, then running `rmarkdown::render("yourfile.Rmd")`. Then when the error is triggered R will stop running and allow you to see what is being done in `knit_meta_add`. With any luck, you'll be able to recognize the data in that function. – user2554330 May 17 '18 at 00:24
  • @Marius I've included what I believe the relevant code would be. About halfway through the top-level forloop that cycles through hosts, it would crash and give me the long-vector error – Nicholas Hassan May 19 '18 at 20:40

0 Answers0