0

I am currently reading in all csv files in a directory, and then rbinding them to create a single data frame.

library(tidyverse)
# combine all logprob data files into one df with rbind
logprobs <- 
  list.files(path="logprob_files", 
             pattern="*.csv",
             full.names=TRUE) %>%
  map_dfr(read_csv, col_names=c("weight", "token_num", "logsumexp", "p_token"),
          col_types='didd')

and the output is:

> head(logprobs)
# A tibble: 6 x 4
  weight token_num    logsumexp   p_token
   <dbl>     <int>        <dbl>     <dbl>
1   0.00         1 -0.002727356 -7.694870
2   0.01         2 -0.014821058 -7.707247
3   0.02         3 -0.026905438 -7.719624
4   0.03         4 -0.038980089 -7.732001
5   0.04         5 -0.051044584 -7.744378
6   0.05         6 -0.063098471 -7.756755

I would like to add an additional column that is just the file name repeated (eventually I will concatenate this with the token_num column. Is there a way to do this within the existing pipeline?

I should add that while the files are named "logprob{1-20}.csv, each file has a different number of tokens, so I can't just append the filename using rep.

Adam_G
  • 7,337
  • 20
  • 86
  • 148
  • 3
    Perhaps `map_dfr(read_csv, col_names=c("weight", "token_num", "logsumexp", "p_token"), col_types='didd', .id = 'grp')` – akrun Jul 29 '18 at 21:21
  • 1
    @akrun - Thanks! That's perfect! @camille - Yes, it looks like one of the answers uses `.id` in that question, but in a slightly different way. Thank you for that, as well. – Adam_G Jul 29 '18 at 21:24
  • Or you can use with a custom function as. `map_dfr(function(x){ t <- read_csv(x, col_names=c("weight", "token_num", "logsumexp", "p_token"),col_types='didd') t$token_num <- paste(x,t$token_num) t })` – MKR Jul 29 '18 at 21:43

0 Answers0