0

I'm writing a function to analyse .csv files in a directory on my hard drive, using a series of for and while loops (I know for loops are unpopular in R, but they're good for what I need).

The function creates a number of data-frames, and performs actions on each one in turn before overwriting them and moving on to the next file in the directory to repeat the action.

The part of the code that does not work so far is the creation of a matrix from vectors taken from the data files being analysed. A simplified version of the code is shown below:

data1 <- seq(1, 10, 1)
data2 <- seq(1, 7, 1)
data3 <- seq(1, 5, 1)

n <- max(length(data1), length(data2), length(data3))

k <- c(1, 2, 3)

for(a in k){
  
  if(a == 1){
    
    length(get(paste("data", a, sep = ""))) <- n
    data_matrix <- get(paste("data", a, sep = ""))
    
  }else{
    
    while(exists(paste("data", a, sep = ""))){
      
      length(get(paste("data", a, sep = ""))) <- n
      data_matrix <- cbind(data_matrix, get(paste("data", a, sep = "")))
      
    }
    
  }
  
} 

The nature of my data is that the length of the columns in my datasets vary with each data collection, so I've adapted a technique found in this post that deals with using cbind to bind objects of a different length without replication of the data within the smaller objects.

The issue I have when trying to implement this code is I get the error message:

Error in length(get(paste("data", a, sep = ""))) <- n : target of assignment expands to non-language object

I'm guessing the issue is that the function get() cannot be used to select items in the Global Environment and to modify them in this way.

James.H
  • 25
  • 6

2 Answers2

2

You could use:

get("x")[1:n]

to get a vector called "x" padded with NA to length n.

That is:

> x=1:3
> n=10
> get("x")[1:n]
 [1]  1  2  3 NA NA NA NA NA NA NA

Having said that, this is a neater way to get the matrix you want (hopefully you can adapt to your scenario):

> datalist <- list(data1, data2, data3)
> maxlength <- max(lengths(datalist))
> sapply(datalist, function(x) x[1:maxlength]  )
      [,1] [,2] [,3]
 [1,]    1    1    1
 [2,]    2    2    2
 [3,]    3    3    3
 [4,]    4    4    4
 [5,]    5    5    5
 [6,]    6    6   NA
 [7,]    7    7   NA
 [8,]    8   NA   NA
 [9,]    9   NA   NA
[10,]   10   NA   NA
George Savva
  • 4,152
  • 1
  • 7
  • 21
  • Hi @George Savva. Thank you for your suggestions, the `get("x")[1:n]` works for what I need. This would be a neater solution if my data did not vary. So the number of datasets extracted from the files is dependent on the directory that is active. This is why I am relying on loops to analyse what is present in the global environment for each scenario, so I would need the datalist to `list(data1, data2, data3, ... , datak)`, so I thought it would be easier to incrementally add columns to a new data-set – James.H Dec 06 '21 at 13:22
  • 1
    No problem. You could use `lapply(dir(pattern = "*.csv"), read.csv)` (or something like that) to get a list of all the datasets read from `csv` files in the active directory. – George Savva Dec 06 '21 at 13:25
0

For those who want to see how the solution proposed by @GeorgeSavva looks using the loop method that I am employing (my loop contained additional errors):

data1 <- seq(1, 10, 1)
data2 <- seq(1, 7, 1)
data3 <- seq(1, 5, 1)

n <- max(length(data1), length(data2), length(data3))

k <- c(1, 2, 3)

for(a in k){
  
  if(a == 1){

    data_matrix <- get(paste("data", a, sep = ""))[1:n]
    
  }else{
    
    data_matrix <- cbind(data_matrix, get(paste("data", a, sep = ""))[1:n])
    
  }
  
} 

While loop was unnecessary. I have written my code this way so that I can make it as versatile as possible as I obtain on a daily basis a varying number of datasets, with a varying size in each dataset.

I can use common operations on each dataset, so I can write a function that will tidy the data, construct charts and compare the datasets automatically without having to write new commands for each analysis.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
James.H
  • 25
  • 6