I am currently reading in all csv files in a directory, and then rbind
ing them to create a single data frame.
library(tidyverse)
# combine all logprob data files into one df with rbind
logprobs <-
list.files(path="logprob_files",
pattern="*.csv",
full.names=TRUE) %>%
map_dfr(read_csv, col_names=c("weight", "token_num", "logsumexp", "p_token"),
col_types='didd')
and the output is:
> head(logprobs)
# A tibble: 6 x 4
weight token_num logsumexp p_token
<dbl> <int> <dbl> <dbl>
1 0.00 1 -0.002727356 -7.694870
2 0.01 2 -0.014821058 -7.707247
3 0.02 3 -0.026905438 -7.719624
4 0.03 4 -0.038980089 -7.732001
5 0.04 5 -0.051044584 -7.744378
6 0.05 6 -0.063098471 -7.756755
I would like to add an additional column that is just the file name repeated (eventually I will concatenate this with the token_num
column. Is there a way to do this within the existing pipeline?
I should add that while the files are named "logprob{1-20}.csv, each file has a different number of tokens, so I can't just append the filename using rep
.