1

I am loading data in using the list.files function that concatenates multiple .csv files into a single data frame. However, I then have to run another line that finds and removes rows that meet a certain criteria. I then run another line that combines the date and time columns into a Date_time coulmn. What I'm working with is below:

#Identifies all csv files in a folder and combines them into one continuous data frame.     
exports_15E<- {list.files(path = "Z:/fishproj/Cambodia Dai project/Analytic/TS2020_Data_Exports",    
                           pattern = "^15E.*FullWaterColumn_100m.sv.csv", full.names = TRUE) %>% 
        lapply(read_csv) %>%                                            
        bind_rows}
#Removes zero values form the data frame
exports_15E<- exports_15E[exports_15E$Layer !=0,]
#Combines date and time columns into one column. 
exports_15E$Date_time <- as.POSIXct(paste(as.Date(as.character(exports_15E$Date_M),"%Y%m%d"), exports_15E$Time_M, sep=" "),format = "%Y-%m-%d %H:%M", tz="Asia/Bangkok")

What I'm trying to do is just combine those 3 functions or just write the syntax in a way that it will just run all of the above and create a single data frame out of it.

Example data frame:

     Layer Time_M  Date_M    Sv_mean
 1     1 18:22:22 20201201   -60.2
 2     1 19:22:23 20201201   -59.8
 3     1 20:22:23 20201201   -60.3
 4     1 21:22:23 20201201   -62.3
 5     1 22:22:23 20201201   -64.2
 6     0 00:00:00 00000000    9999
 7     0 00:00:00 00000000    9999
 8     0 00:00:00 00000000    9999
 9     0 00:00:00 00000000    9999
10     0 00:00:00 00000000    9999

Desired output:

         Layer Time_M  Date_M   Sv_mean       Date_time
     1     1 18:22:22 20201201   -60.2   2020-12-01 18:22:22 
     2     1 19:22:23 20201201   -59.8   2020-12-01 19:22:22 
     3     1 20:22:23 20201201   -60.3   2020-12-01 20:22:22 
     4     1 21:22:23 20201201   -62.3   2020-12-01 21:22:22 
     5     1 22:22:23 20201201   -64.2   2020-12-01 22:22:22 

The code is meant to create this from my .csv files (which is simple enough), then I want the code to remove the rows with layers that read "0", and also combine the "Date_M" and "Time_M" columns into a "Date_time" column. The code I have works and does all this, I just want it to run all the strings at once.

GuedesBF
  • 8,409
  • 5
  • 19
  • 37
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 13 '21 at 20:56
  • What do you mean by "write the syntax in a way that it will just run all of the above and create a single data frame out of it" ? Do you want a single pipe ( %>% ) chain ? – Arnaud Feldmann Aug 13 '21 at 21:18
  • Yes, I've tried doing "%>%" chains but I can't seem to do it right. – Jackson A Swan Aug 13 '21 at 21:25
  • Please see that the data you provided is not a an actual reprex. You Should rather provide code for the data, not a formatted table. This is also the desired output, not the raw data. – GuedesBF Aug 13 '21 at 21:28

1 Answers1

1

I would use a single pipe %>%chain. I think dplyr and purrr are most helpful here. Specially, these packages allow the awesome map_dfr() ("map-data.frame-rowwise") and strightforward filter() functions, which make data wrangling a fluid process. Something like this may work (hard to tell without a proper reproducible example).

library(dplyr)
library(purrr)

exports_15E<- list.files(path = "Z:/fishproj/Cambodia Dai project/Analytic/TS2020_Data_Exports",    
                           pattern = "^15E.*FullWaterColumn_100m.sv.csv", full.names = TRUE) %>% 
        map_dfr(read_csv) %>%
        filter(Layer !=0) %>%
        mutate(Date_time = as.POSIXct(paste(as.Date(as.character(Date_M),"%Y%m%d"), Time_M, sep=" "),format = "%Y-%m-%d %H:%M", tz="Asia/Bangkok"))
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
  • 1
    @JacksonASwan, check out the tidyverse. Once you get used to it, you never go back to base R for these simple data wrangling tasks. DPLYR and PURRR are awesome. https://www.tidyverse.org – GuedesBF Aug 13 '21 at 21:32