Are you sure you don't have a tibble instead of a data.frame ?
For a one-variable tibble, using [, 1]
will do nothing as you'll get the same one-variable tibble. Always see a data.frame or a tibble as a list (not a matrix) and use [[1]]
to access the first variable as a vector.
In terms of timings (with 10 times less data):
ord_ch <- rep(replicate(700, paste(sample(letters, 40, TRUE), collapse = "")), 100)
ord_df <- data.frame(xcol = ord_ch, stringsAsFactors = FALSE)
ord_df_fct <- data.frame(xcol = ord_ch, stringsAsFactors = TRUE)
ord_tbl <- tibble::tibble(xcol = ord_ch)
microbenchmark::microbenchmark(
substr(ord_ch, 23, 36),
substr(ord_df[, 1], 23, 36),
substr(ord_df_fct[, 1], 23, 36),
substr(ord_tbl[, 1], 23, 36),
times = 10
)
Benchmark result:
Unit: milliseconds
expr min lq mean median
substr(ord_ch, 23, 36) 8.807504 8.921520 9.253258 9.321168
substr(ord_df[, 1], 23, 36) 8.711323 8.775754 9.030802 8.965194
substr(ord_df_fct[, 1], 23, 36) 9.337599 9.544920 10.065594 9.595284
substr(ord_tbl[, 1], 23, 36) 1433.387037 1446.136184 1456.639754 1453.826835
uq max neval
9.391774 10.077075 10
9.167970 9.713614 10
10.016577 12.173109 10
1460.824234 1494.942769 10