I wonder if there is a way to prevent arrow from pulling data into R by default when it cannot find a suitable binding.
So that instead of getting the following warning message pulling data into R
, arrow will throw an error instead.
Is there an option I can tweak to get this behavior?
I know there is a list of active bindings I can consult on the arrow documentation. However, I would like to work with the default settings mentioned above for faster iteration and experimentation without falling into long computations outside the arrow framework.
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(stringr)
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#>
#> Matrix products: default
#> BLAS: /opt/R/4.1.3/Resources/lib/libRblas.0.dylib
#> LAPACK: /opt/R/4.1.3/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] stringr_1.4.1 dplyr_1.0.10 arrow_10.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] pillar_1.8.1 compiler_4.1.3 highr_0.9 R.methodsS3_1.8.2
#> [5] R.utils_2.12.0 tools_4.1.3 digest_0.6.29 bit_4.0.4
#> [9] evaluate_0.16 lifecycle_1.0.2 tibble_3.1.8 R.cache_0.16.0
#> [13] pkgconfig_2.0.3 rlang_1.0.6 reprex_2.0.2 DBI_1.1.3
#> [17] cli_3.4.1 rstudioapi_0.14 yaml_2.3.5 xfun_0.32
#> [21] fastmap_1.1.0 withr_2.5.0 styler_1.7.0 knitr_1.40
#> [25] generics_0.1.3 fs_1.5.2 vctrs_0.4.2 bit64_4.0.5
#> [29] tidyselect_1.1.2 glue_1.6.2 R6_2.5.1 fansi_1.0.3
#> [33] rmarkdown_2.16 purrr_0.3.4 magrittr_2.0.3 htmltools_0.5.3
#> [37] assertthat_0.2.1 utf8_1.2.2 stringi_1.7.8 R.oo_1.25.0
df <- tibble(
date = c("28-Aug-21", "11-Mar-19")
)
df <- arrow::arrow_table(df)
df %>%
mutate(date = str_remove(date, "\\d{2}$"))
#> Warning: Expression str_remove(date, "\\d{2}$") not supported in Arrow; pulling
#> data into R
#> # A tibble: 2 × 1
#> date
#> <chr>
#> 1 28-Aug-
#> 2 11-Mar-
Created on 2022-11-08 with reprex v2.0.2
Many thanks for considering my request.