I am working on a project where I need to graph 10 days worth of data from remote sites.
I am downloading new data every 30 minutes from a remote computer via FTP (data is written every half hour also). The local (onsite) file path changes every month so I have a dynamic IP address based on the current date.
eg.
/data/sitename/2020/July/data.csv
/data/sitename/2020/August/data.csv
My problem is at each new month the csv I am downloading will be in a new folder and when I FTP the new csv file, it will only contain data from the new month and not the previous months.
I need to graph the last 10 days of data. So what I'm hoping to do is download the new data every half hour and only append the newest records to the master data set. Or is there a better way all together?
What I (think I) need to do is download the csv into R, and append only the new data to a master file and remove the oldest records so as to only contain 10 days worth of data in the csv. I have searched everywhere but cannot seem to crack it.
This seems like it should be so easy, maybe I am using the wrong search terms.
I would like the following pretty please (showed 10 lines of data, I'll need 480 for 10 days).
INITIAL DATA
DateTime Data1 Data2 Data3 Data4 Data5
641 2020-08-26T02:31:59.999+10:00 10.00 53.4 3.101 42 20.70
642 2020-08-26T03:01:59.999+10:00 11.11 52.0 2.778 44 20.70
643 2020-08-26T03:31:59.999+10:00 2.63 105.5 2.899 45 20.70
644 2020-08-26T04:01:59.999+10:00 11.11 60.5 2.920 45 20.70
645 2020-08-26T04:31:59.999+10:00 3.03 101.3 2.899 48 20.70
646 2020-08-26T05:01:59.999+10:00 2.86 125.2 2.899 49 20.65
647 2020-08-26T05:31:59.999+10:00 2.86 132.2 2.899 56 20.65
648 2020-08-26T06:01:59.999+10:00 3.23 113.9 2.963 61 20.65
649 2020-08-26T06:31:59.999+10:00 3.45 113.9 3.008 64 20.65
650 2020-08-26T07:01:59.999+10:00 3.57 108.3 3.053 66 20.65
NEW DATA
DateTime Data1 Data2 Data3 Data4 Data5
641 2020-08-26T02:31:59.999+10:00 10.00 53.4 3.101 42 20.70
642 2020-08-26T03:01:59.999+10:00 11.11 52.0 2.778 44 20.70
643 2020-08-26T03:31:59.999+10:00 2.63 105.5 2.899 45 20.70
644 2020-08-26T04:01:59.999+10:00 11.11 60.5 2.920 45 20.70
645 2020-08-26T04:31:59.999+10:00 3.03 101.3 2.899 48 20.70
646 2020-08-26T05:01:59.999+10:00 2.86 125.2 2.899 49 20.65
647 2020-08-26T05:31:59.999+10:00 2.86 132.2 2.899 56 20.65
648 2020-08-26T06:01:59.999+10:00 3.23 113.9 2.963 61 20.65
649 2020-08-26T06:31:59.999+10:00 3.45 113.9 3.008 64 20.65
650 2020-08-26T07:01:59.999+10:00 3.57 108.3 3.053 66 20.65
651 2020-08-26T07:31:59.999+10:00 3.85 109.7 3.125 70 20.65
REQUIRED DATA
DateTime Data1 Data2 Data3 Data4 Data5
642 2020-08-26T03:01:59.999+10:00 11.11 52.0 2.778 44 20.70
643 2020-08-26T03:31:59.999+10:00 2.63 105.5 2.899 45 20.70
644 2020-08-26T04:01:59.999+10:00 11.11 60.5 2.920 45 20.70
645 2020-08-26T04:31:59.999+10:00 3.03 101.3 2.899 48 20.70
646 2020-08-26T05:01:59.999+10:00 2.86 125.2 2.899 49 20.65
647 2020-08-26T05:31:59.999+10:00 2.86 132.2 2.899 56 20.65
648 2020-08-26T06:01:59.999+10:00 3.23 113.9 2.963 61 20.65
649 2020-08-26T06:31:59.999+10:00 3.45 113.9 3.008 64 20.65
650 2020-08-26T07:01:59.999+10:00 3.57 108.3 3.053 66 20.65
651 2020-08-26T07:31:59.999+10:00 3.85 109.7 3.125 70 20.65
This is where I am at...
library(RCurl)
library(readr)
library(ggplot2)
library(data.table)
# Get the date parts we need
Year <-format(Sys.Date(), format="%Y")
Month <- format(Sys.Date(), format="%B")
MM <- format(Sys.Date(), format="%m")
# Create the file string and read
site <- glue::glue("ftp://user:passwd@99.99.99.99/path/{Year}/{Month}/site}{Year}-{MM}.csv")
site <- read.csv(site, header = FALSE)
# Write table and create csv
EP <- write.table(site, "EP.csv", col.names = FALSE, row.names = FALSE)
EP <- fread("EP.csv", header = FALSE, select = c( 1, 2, 3, 5, 6, 18))
output<- write.table(EP, file = 'output.csv', col.names = c("A", "B", etc), sep = ",", row.names = FALSE)
#working up to here
# Append to master csv file
master <- read.csv("C:\\path\\"master.csv")