0

I almost finish a messy code to apply several statistical methods/test to 11 data frames from different watersheds with physico-chemical parameters as variables. I reach the goal, but I need to do this functional. So to start i made a function to compute correlation, and save the results as .txt tables and .pdf images. It works great when run the function to one dataframe at the time (for that you should import each dataframe separately using read.table, which is not written in the code below). As i want it functional, made a list of the 11 dataframes and use lapply to run the function to each one. It works in the sense that gives me one list (corr) containing the correlation results of each dataframe.

Here comes the issues:

  1. The list cor with correlation results for each dataframe looks like has values instead of data frames, so i dont know how to access or save them (see the corr list in the Environment/Data window). Well, until here, at least looks like correlation results exists somewhere.
  2. The second problem is that when i run corr<-lapply(PQ_data, cor_PQ), which has a line to save the outputs as tables (.txt) and images (.pdf) using part of the name of the original dataframe computed (e.g first element of PQ_data is "AgIX_E_PQ" so table and plot of cor_PQ(PQ_data[["AgIX_E_PQ"]] should get the names "mCorAgIX_E_PQ.txt" and "CorAgIX_E_PQ.pdf" respectively), im getting just one output (mCorX[[I]].txt and CorX[[i]].pdf) with the last dataframe correlation result. That is, tables and images for each dataframe correlation result are overwritten into this generics mCorX[[I]].txt, CorX[[i]].pdf files.

Now i guess have to define 'i' or something to avoid this. Should i define cor_PQ function for PQ_data instead X?

If anyone can see where im failing, i will appreciate any help to solve this, please.

My data: PQ_data /save it in your workspace and fix setwd with it. My code:

rm(list=ls(all=TRUE))
cat("\014")

setwd("C:/Users/Sol/Documents/ProyectoTítulo/CalidadAgua/Matrices/Regs") #my workspace

PQ_files<-list.files(path="C:/Users/Sol/Documents/ProyectoTítulo/CalidadAgua/Matrices/Regs",
                     pattern="\\_PQ.txt") #my list of 14 dataframes in my workspace.
PQ_data<-lapply(PQ_files, read.table) #read tables of the 14 dataframes in the list.
names(PQ_data)<-gsub("\\_PQ.txt","", PQ_files) #name the 14 dataframes with their original names.

#FUNCTION TO COMPUTE CORRELATIONS, SAVE TABLES AND PLOTS.
cor_PQ<-function(X) {
  corPQ<-cor(X, use="pairwise.complete.obs")
  outputname.txt<-paste0("mCor",deparse(substitute(X)),".txt")
  write.table(corPQ, file=outputname.txt)
  outputname.pdf<-paste0("Cor",deparse(substitute(X)),".pdf")
  pdf(outputname.pdf)
  plot(X)
  dev.off()
  return(corPQ)
}

corr<-lapply(PQ_data, cor_PQ)

After this, as i said, a get a list called "corr" with 11 elements containing correlation results from each dataframe in my list (PQ_data), but i cant access them as tables when i pin the "corr" list in my environment/data window (they dont show the blue R arrow to expand the element). ` And i get only 2 output files called mCorX[[I]].txt and CorX[[i]].pdf showing only the last dataframe correlation result because the write.table and .pdf functions overwrite the results of the 10 previous calculations. Again, i will appreciate any help. I really need a push to catch the idea. Thanks!!!

  • You could `lapply(names(PQ_data), cor_PQ)` then replace `X` with `PQ_data[[X]]` and `deparse(substitute(X))` with `X` inside the function `cor_PQ`. 2 additional notes: (1) I would use variable name other than `cor_PQ` inside the function `cor_PQ` and (2) did you mean to plot `X` (i.e. the data frame) or the correlation matrix? – sboysel Dec 17 '19 at 21:39
  • Hi, thanks for you support. When i put ```PQ_data[[X]]``` in my ```cor_PQ``` function it gives an unexpected error due brackets. Where should i replace it? About your notes, (1) do you mean that ```cor_PQ``` in ```cor_PQ<-cor(X, use="pairwise.complete.obs")``` should be defined as another variable? i guess im not aware why is that, and (2) with ```plot(x)``` i expect the function plot the correlation result from the data frame computed (14 in total) to be able to save it following the generic method to save plots as ```pdf(name_of_plot.pdf) plot(name_of_plot) dev.off()``` – Cristóbal Jaraba Nilo Dec 17 '19 at 22:02
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making a reproducible question—that includes a sample of data to work with. Right now we can't run your code, and we can't see any output. Also reread the *minimal* part of [mcve]—cutting the question down to its essentials makes it easier for folks to help with, and is a good first step for yourself in debugging – camille Dec 17 '19 at 23:10
  • @camille i just edited my post and add my data to allow you to check and run. Thanks for your time and support. – Cristóbal Jaraba Nilo Dec 18 '19 at 00:40

1 Answers1

1

lapply doesn't send names of the list to the function. So although the function works for individual files it doesn't work with list of files. Also since there are no names to the files all the files generated are given the same name, hence all the new files overwrite the previously existing files and in the end you get output with only 1 file which is the last element in your list. You can use the below function where we send the names as different parameter to assign the name to the files.

cor_PQ<-function(X, Y) {
   corPQ<-cor(X, use="pairwise.complete.obs")
   outputname.txt<-paste0("mCor",Y,".txt")
   write.table(corPQ, file= outputname.txt)
   outputname.pdf<-paste0("Cor",Y,".pdf")
   pdf(outputname.pdf)
   plot(X)
   dev.off()
   return(corPQ)
}

Now use Map to apply the same function.

Map(cor_PQ, PQ_data, names(PQ_data))

We can also use imap from purrr to apply this function.

purrr::imap(PQ_data, cor_PQ)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Wow! That made exactly what i was looking for. Awesome! So, this means that when we apply a function over a list of data frames we have to consider a second variable ("Y") as each dataframe from the list? Thanks again Ronak – Cristóbal Jaraba Nilo Dec 18 '19 at 02:05
  • @CristóbalJarabaNilo You need to consider the second variable only if you want to use names somewhere in the function. – Ronak Shah Dec 18 '19 at 02:20
  • Im not really sure why is that. I mean, this was also causing that the old ```lapply(PQ_data, cor_PQ)``` were given one single list with all the correlations results instead of producing separates results from the different dataframes? Or to do that was necessary to define a second variable? If i dont use names, only index, and want to produce separates outputs, i should define a second variable ```[i]```? Thanks again for your support and as a special ask, could you recommend me something to read/watch/practice and learn all this kind of things about R? Im decided to learn more. Thanks – Cristóbal Jaraba Nilo Dec 18 '19 at 02:38
  • ok so to simplify a bit, see that `names(PQ_data)` returns its names but `PQ_data[[1]]` has no names which was seen in `names(PQ_data)` hence you got files with names such as `mCorX[[I]].txt` . – Ronak Shah Dec 18 '19 at 03:18