0

I'm creating a huge dataframe containing data from multiple .csv files with prices of different colours per day.

I want to add a column containing the name of the .csv file that each dataframe is created from. So eg. if the file is called "GRAY.csv" I want the resulting dataframe to add a new column "name" with each observation being "GRAY".

To create the dataframe I am using the map_df() function from the purrr package.

data_folder <- "data"
csv_files <- dir(data_folder, pattern = "[A-Z]\\.csv")

df <- csv_files %>%
  map_df(~ read_csv(file.path(data_folder, .)))

What I get

df

## # A tibble: 3,175 × 2
##    date       price
##    <date>     <dbl>
##  1 2010-01-04  7.64
##  2 2010-01-05  7.66
##  3 2010-01-06  7.53
##  4 2010-01-07  7.52
##  5 2010-01-04  10.57
##  6 2010-01-05  10.50
##  7 2010-01-06  10.42
##  8 2010-01-07  10.52
##  9 2010-01-04  6.48
## 10 2010-01-05  6.35
## # … with 3,165 more rows

What I want

df

## # A tibble: 3,175 × 3
##    date       price  name
##    <date>     <dbl>  <chr>
##  1 2010-01-04  7.64   GRAY
##  2 2010-01-05  7.66   GRAY
##  3 2010-01-06  7.53   GRAY
##  4 2010-01-07  7.52   GRAY
##  5 2010-01-04  10.57  BLUE
##  6 2010-01-05  10.50  BLUE
##  7 2010-01-06  10.42  BLUE
##  8 2010-01-07  10.52  BLUE
##  9 2010-01-04  6.48   RED
## 10 2010-01-05  6.35   RED
## # … with 3,165 more rows

How do I add this name column?

2 Answers2

2
csv_files %>%
 set_names(fs::path_ext_remove(basename(.))) %>%
 map_df(~ read_csv(file.path(data_folder, .)), .id = 'name')
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Before merging everything into one data frame, modify each individual data frame to have this name column, and then merge the data frames at the end using purrr::reduce().

library(tidyverse)
csv_files %>%
  map(
    ~ read_csv(file.path(data_folder, .x)) %>%
      mutate(name = str_remove(.x, "\\.csv"))
  ) %>%
  reduce(bind_rows)
kybazzi
  • 1,020
  • 2
  • 7