0

just got into data analysis, and this is my first post.

I have 12 dataframes in R that I need to first subset from, then add a bunch of columns to. My current solution is to just copy/paste the same code chunk over and over and edit the numbers for each corresponding data frame, but that's obviously inefficient and not sustainable for future practice.

This is how I'm subsetting the data currently:

sub_202212 <- subset(
  cyclistic202212,
  select = c(ride_id, rideable_type, started_at, ended_at, member_casual))

And this is currently how I'm adding columns:

sub_202202$month <- format(as.Date(sub_202202$date), "%m")
sub_202202$day <- format(as.Date(sub_202202$date), "%d")
sub_202202$year <- format(as.Date(sub_202202$date), "%Y")
sub_202202$day_of_week <- format(as.Date(sub_202202$date), "%A")

I'm wondering if there's a "batch process" that can drastically shorten my code.

Better yet, please let me know if there's also a way to subset columns while adding them at the same time (since it's the same addition/subtraction across all DFs).

And if a similar question has been asked (but worded better, which could be why I haven't found it), then please point me in that direction. This is all very new to me, and I still have a lot to learn.

Any insight would be greatly appreciated. :)

edit: sample code

df1 <- data.frame(id = c(1, 2, 3, 4, 5),
                  trips = c(3, 6, 3, 7, 8))
df2 <- data.frame(id = c(6, 7, 8, 9, 10),
                  trips = c(3, 5, 2, 7, 10))
weeelum
  • 1
  • 1
  • Hi! Have you tried using `tidyverse` package? – Lucas Feb 21 '23 at 19:06
  • Can you provide some sample data? I can do it using tidyverse – Lucas Feb 21 '23 at 19:07
  • When you have more than one frame to which you apply the same processes, I recommend storing them in a [list of frames](https://stackoverflow.com/a/24376207/3358227) and using `lapply` or similar. – r2evans Feb 21 '23 at 19:09
  • Hi @Lucas! Sure thing - what do you mean provide sample data? Send you some of these .csv files you mean? – weeelum Feb 21 '23 at 19:10
  • You can also use a `for` statement to it for each data frame – Lucas Feb 21 '23 at 19:10
  • Hi @r2evans. I've come across `lapply` and `for` loops, but I couldn't figure out how to implement either with what I need. – weeelum Feb 21 '23 at 19:12
  • Yes!! If you can write a code that creates a dataframe it will be great – Lucas Feb 21 '23 at 19:12
  • @weelum do you want to save the changes in another .csv? – Lucas Feb 21 '23 at 19:12
  • if `subs <- list(cyclistic202212, cyclistic202213)`, then `subs2 <- lapply(subs, function(X) { X <- subset(X, select = c(ride_id, rideable_type, ...)); X$month <- format(as.Date(X$date), "%m"); ...; X; })`. – r2evans Feb 21 '23 at 19:17
  • 1
    Thanks @r2evans, looking at your example, `lapply` is starting to make more sense to me. will try that out. – weeelum Feb 21 '23 at 19:26

1 Answers1

0

This code will read all the csv files in the path (I'd suggest you to create somewhere to put only the dataframes you want to change), make the changes and then export then with the same name


library(tidyverse)
for(i in list.files(path = "your path", pattern = "*.csv")){
  df <- read_csv(i)
  df %>% mutate(month = format(as.Date(date), "%m"), 
                day = format(as.Date(date), "%d"),
                year = format(as.Date(date), "%Y"),
                day_of_week = format(as.Date(date), "%A")) %>% 
    select(ride_id, rideable_type, started_at, ended_at, member_casual, month,
           day, year, day_of_week) %>% write_csv(i)
}
Lucas
  • 302
  • 8
  • hi, just added some sample code, but I'll try out your solution for now. also, ultimately I want to put all these separate .csv files into one excel file, but as separate tab/sheets. – weeelum Feb 21 '23 at 19:28