0

I am trying to figure out how to extract data from a SQL Server table and save the contents of one table, into separate CSV files, based on a field called ASOFDATE. My ASOFDATE is a month end date.

Here is what I have so far.

library(RODBC)

# connect to database 
dbhandle = odbcDriverConnect('driver={SQL Server};server=SERVER_NAME;database=DATABASE_NAME;trusted_connection=true')

sampledata <- sqlQuery(dbhandle, "select * from [SERVER].[dbo].[TABLE]")

# Close the connection
close(dbhandle) 

# Create a Date vector containing the 1st of all the subsequent months and subtract 1 day 
# date.end.month <- seq(as.Date("2012-02-01"),length=4,by="months")-1

ASOFDATE <- seq(as.Date("2016-02-01"),length=12,by="months")-1

Somehow in here I think I need to loop through the DataFrame, but I’m not sure how to do it.

for (i in sampledata) {
  do some kind of filter by ASAOFDATE

write.csv(sampledata, paste0("C:/Users/RSMITH/Desktop/test/", ASOFDATE, ".csv"), row.names = FALSE) 

}

So, for instance, I use this to fetch data from the table.

dataset <- sqlFetch(dbhandle, "DATA")

The data set has these fields: N_ID,SEQ_NO,REPORT_PERIOD_END,CONTACT_ID,CONSOL_KEY,CUSTOMER_NAME,COMPANY_CODE,SECTOR,OFFICER,ASOFDATE,COMPANY,REPORT_LINE

I create a list of month end dates that I want to 'filter' for: ASOFDATE <- seq(as.Date("2016-02-01"),length=12,by="months")-1

Then, 'filter' records in the table, based on ASOFDATE, and save each chunk as a CSV, with the date in the file name.

I'm 100% sure this is possible, but I just don't know how to cut up the data into chunks and save each chunk.

ASH
  • 20,759
  • 19
  • 87
  • 200
  • So you want all records before each ASOFDATE in separate csv files? Or do you want them to contain all records between two ASOFDATEs? – Ian Wesley May 25 '17 at 21:50
  • In any case, you should supply some part of your data to illustrate the problem. See this post on how to do it: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Thomas K May 25 '17 at 21:56

1 Answers1

1

You can make use of the tidyverse here. First you create a vector of paths where you will save your files to. Then you split your data into a list, according to ASOFDATE, and walk along this list, saving the elements of the list along the way.

library(tidyverse)

paths <- paste0("Desktop/", sampledata$ASOFDATE, ".csv")

sampledata %>% 
  split(.$ASOFDATE) %>% 
  walk2(., paths, ~write_csv(.x, .y))

Data

sampledata <- data_frame(
  ASOFDATE = seq(from = as.Date("2017-05-01"), as.Date("2017-05-05"), "days"),
  var1 = 1:5,
  var2 = 1:5
)
Thomas K
  • 3,242
  • 15
  • 29
  • Thanks, your specific example works, but how can I adapt that to my scenario? I'm fetching the data from the table: dataset <- sqlFetch(dbhandle, "TABLE_NAME"). Now, I need to use that in the DataFrame: sampledata <- data_frame(...) – ASH May 26 '17 at 13:39
  • @ryguy72 Can you update your post with a sample of your result from `sqlFetch(dbhandle, "TABLE_NAME")`? On how to do that see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas K May 26 '17 at 13:42
  • I just added some more info to my original post. – ASH May 26 '17 at 14:21
  • @ryguy72 it is still a bit sparse. How does your column `ASOFDATE` within `dataset` look like? Why can't you use it like I suggested? `dataset %>% split(.$ASOFDATE) %>% ...` – Thomas K May 26 '17 at 14:32
  • @ryguy72 If you have a column inside your data, which determines the chunks for saving, than you don't need to create anything else outside of your `dataset`. – Thomas K May 26 '17 at 14:33
  • I guess I don't know R well enough to figure this out. When I run this: dataset %>% split(.$ASOFDATE) %>% walk2(., paths, ~write_csv(.x, .y)) It gets into some kind of perpetual loop and I get 2 CSV files; one with yesterday's date and one that is named 'NA'. Both have the ENTIRE contents of the table. That's the problem now. – ASH May 26 '17 at 14:51
  • Could you please do 'head(dataset$ASOFDATE)' or 'unique(dataset$ASOFDATE)' and post the result? Until you don't make your example reproducible (I posted the link on how to do that twice) it is difficult to help you. – Thomas K May 26 '17 at 14:57
  • > head(dataset$ASOFDATE) [1] "2017-05-25 EDT" "2017-05-25 EDT" "2017-05-25 EDT" NA "2017-05-25 EDT" "2017-05-25 EDT" > unique(dataset$ASOFDATE) [1] "2017-05-25 EDT" NA – ASH May 26 '17 at 15:02
  • So you just have data for one day (and some missing data for the dates)? What else, then, would your desired output be, than two files? If you want any further help, you really should create a reproducible example, which clearly shows how the input data looks like, and what your desires output/result is. – Thomas K May 26 '17 at 15:16
  • AHHHH! Now I see what the problem was. I thought I was hitting a different table! All set now. Thanks for all the help on this. I appreciate it. – ASH May 26 '17 at 15:45