I have a CSV file of appointments that's 2284x11. I created a script that goes down a list of users, creates a data frame of all the appointments that user had, and then sends it to a .Rmd file to be exported as a PDF.
When I try to do this all at once, I receive the following error:
Error in knitr::knit_meta_add(old_knit_meta, attr(old_knit_meta, "knit_meta_id")) :
long vectors not supported yet: ../../../../R-3.3.2/src/main/memory.c:1668
If I run my script for only a few of the users at a time until I cover the entire list, I do not receive the error. This prompts me to think there's nothing wrong with the individual tables I create for each user (The largest is ~70x7), but that somehow my script is caching all the results in an inefficient way. I followed the advice from:
"long vectors not supported yet" error in Rmd but not in R Script
But turning off those cache setting hasn't helped.
Does anyone know common reasons for accidentally creating long vectors? Once again the original csv file is only 2248x11, less than 250kb, and most of the transformations I apply are just cleaning the data, subsetting, and some aggregating.
Is there a way to view the kind of data my script is storing in the background that might be causing this error?
Edit: Here is what I think the relevant code would be. I already have a data frame of sessions
(including payment towards the host), a data frame of hosts
and some personal information, and a data frame of tax
values based on location (which affect host payments).
The following R code filters for sessions under a host's name and adds up the values in the Payment column. It sends the resulting table to a .Rmd file to be exported as a pdf.
for (i in 1:nrow(hosts)) {
#our hosts are paid on one of two monthly cycles
if (hosts[i,2] == cycle) {
#identify sessions under a host's name and that have payment associated
hostmatch = which(sessions$Host == hosts$Name[i] & sessions$Payment != "")
#only continue if host has sessions under their name with associated payments
if (length(hostmatch) > 0) {
hostsessions = sessions[hostmatch,] #filtering for matched sessions
#just renaming the columns to look better for exported PDF
colnames(hostsessions) <- c("Host","User","Appointment Date", "Appointment Time", "Appointment Type", "Appointment Status", "Payment to Host")
if (all(hostsessions[,7] == "$0.00")) {
#if all the host's recorded payments were for $0.00, skip to checking the next host
next
}
else {
#Payment is recorded as string starting with '$'. This adds up those values for a given host
Addup = sum(as.numeric((sub("$","", hostsessions$Payment, fixed = TRUE))))
if (hosts$PayTax[i] == "no") {
payout = paste("$", format(round(Addup, 2), nsmall = 2), sep="")
#This appends a final row that shows the host's total payout. Columns 1:5 are NAs and appear as blank cells in the PDF
hostsessions[nrow(hostsessions)+1, c(6,7)] <- c('Total:', payout)
}
else {
#modified version of previous loop that accounts for tax, which is listed by Province in a separate table
#Creates three rows at the end that consist largely of NAs
hosttax <- merge(hosts[i,], tax, by = 'Province') #Only one row, so subsetting with $ returns a single vector
hostsessions[nrow(hostsessions)+1, c(6,7)] <- c('Subtotal:', paste("$", format(round(Addup, 2), nsmall = 2), sep=""))
#Convoluted way to get an output in the format: 'HST (13%): $10.90'
hostsessions[nrow(hostsessions)+1, c(6,7)] <- c(paste(unlist(hosttax$TaxType), ' (', as.character((hosttax$TaxAmount - 1) * 100), '%)', sep = ''), paste("$", format(round(Addup*(hosttax$TaxAmount - 1), 2), nsmall = 2), sep=""))
payout = paste("$", format(round(Addup*hosttax$TaxAmount, 2), nsmall = 2), sep="")
hostsessions[nrow(hostsessions)+1, c(6,7)] <- c("Total:", payout)
}
#Send to Payments.Rmd to create pdf
rmarkdown::render(input = "Payments.Rmd",
output_format = "pdf_document",
output_file = paste(host$Name[i]," Statement ", date, ".pdf", sep=''),
output_dir = "~/")
}
}
}
}
The .Rmd file is as follows. There are a number of \usepackage statements that I inserted to overcome other errors I was facing, but I don't fully understand why they were needed.
```{r, include = FALSE}
payment <- paste(hosts$Name[i], " Statement: ", date)
```
---
title: "`r payment`"
output: pdf_document
classoption: landscape
header-includes:
- \usepackage{float}
- \usepackage[table]{xcolor}
- \usepackage{graphicx}
- \usepackage{booktabs}
- \usepackage{longtable}
---
```{r, echo = FALSE}
library(knitr)
library(markdown)
library(rmarkdown)
#I experimented with different ways of setting cache to false. None of them seemed to work
options(cache = FALSE, warning = FALSE,
message = FALSE, cache.lazy = FALSE, knitr.kable.NA = '')
#formats the table previously created that showed all the appointments a host had and the payment associated
kable(hostsessions, format = "latex", booktabs = T, longtable = T) %>%
kable_styling(latex_options = c("striped", "repeat_header"), font_size = 9) %>%
row_spec(nrow(counsellorsessions), bold = T)
```