0

I have a for loop that loops through a list of urls,

url_list <- c('http://www.irs.gov/pub/irs-soi/04in21id.xls',
          'http://www.irs.gov/pub/irs-soi/05in21id.xls',
          'http://www.irs.gov/pub/irs-soi/06in21id.xls', 
          'http://www.irs.gov/pub/irs-soi/07in21id.xls',
          'http://www.irs.gov/pub/irs-soi/08in21id.xls', 
          'http://www.irs.gov/pub/irs-soi/09in21id.xls',
          'http://www.irs.gov/pub/irs-soi/10in21id.xls',
          'http://www.irs.gov/pub/irs-soi/11in21id.xls',
          'http://www.irs.gov/pub/irs-soi/12in21id.xls',
          'http://www.irs.gov/pub/irs-soi/13in21id.xls',
          'http://www.irs.gov/pub/irs-soi/14in21id.xls',
          'http://www.irs.gov/pub/irs-soi/15in21id.xls')

dowloads an excel file from each one assigns it to a dataframe and performs a set of data cleaning operations on it.

library(gdata)
for (url in url_list){
  test <- read.xls(url)
  cols <- c(1,4:5,97:98)
  test <- test[-(1:8),cols]
  test <- test[1:22,]
  test <- test[-4,]
  test$Income <-test$Table.2.1...Returns.with.Itemized.Deductions..Sources.of.Income..Adjustments..Itemized.Deductions.by.Type..Exemptions..and.Tax..Items..by.Size.of.Adjusted.Gross.Income..Tax.Year.2015..Filing.Year.2016.
  test$Total_returns <- test$X.2
  test$return_dollars <- test$X.3
  test$charitable_deductions <- test$X.95
  test$charitable_deduction_dollars <- test$X.96
  test[1:5] <- NULL
}

My problem is that the loop simply writes over the same dataframe for each iteration through the loop. How can I have it assign each iteration through the loop to a data frame with a different name?

Noah Olsen
  • 271
  • 1
  • 14
  • you can use the [save function](https://www.rdocumentation.org/packages/base/versions/3.4.1/topics/save) just before the closing curly bracket to write the `test` object – Imran Ali Nov 06 '17 at 23:58
  • A hint on naming the `test` object `for(i in 1:5){ print(paste0("test", i))}` – Imran Ali Nov 07 '17 at 00:06

3 Answers3

1

Use assign. This question is a duplicate of this post: Change variable name in for loop using R

For your particular case, you can do something like the following:

for (i in 1:length(url_list)){
  url = url_list[i]
  test <- read.xls(url)
  cols <- c(1,4:5,97:98)
  test <- test[-(1:8),cols]
  test <- test[1:22,]
  test <- test[-4,]
  test$Income <-test$Table.2.1...Returns.with.Itemized.Deductions..Sources.of.Income..Adjustments..Itemized.Deductions.by.Type..Exemptions..and.Tax..Items..by.Size.of.Adjusted.Gross.Income..Tax.Year.2015..Filing.Year.2016.
  test$Total_returns <- test$X.2
  test$return_dollars <- test$X.3
  test$charitable_deductions <- test$X.95
  test$charitable_deduction_dollars <- test$X.96
  test[1:5] <- NULL
  assign(paste("test", i, sep=""), test)
}
Kelli-Jean
  • 1,417
  • 11
  • 17
0

You could write to a list:

result_list <- list()
for (i_url in 1:length(url_list)){
    url <- url_list[i_url]
    ...
    result_list[[i_url]] <- test
}

You can also name the list

names(result_list) <- c("df1","df2","df3",...)
tobiasegli_te
  • 1,413
  • 1
  • 12
  • 18
0

Here's another approach with lapply instead of for loops which will write all resulting data.frames as separate list items which can then be re-named (if needed).

url_list <- c('http://www.irs.gov/pub/irs-soi/04in21id.xls',
              ...
              'http://www.irs.gov/pub/irs-soi/15in21id.xls')

readURLFunc <- function(z){
  test <- readxl::read_xls(z)
  ...
  test[1:5] <- NULL
  return(test)}

data_list <- lapply(url_list, readURLFunc)
Gautam
  • 2,597
  • 1
  • 28
  • 51