1

Sorry for my bad english, I want calculate the coefficients of correlations of many data frame into a list, so I have a list with 28 dataframe but I want calculate the coefficients of first 7 Dataframe's, each dataframe have a two columns, once is the Date and the other its values:

my.files <- list.files(pattern = ".csv")

my.data <- lapply(my.files,
              read.csv,
              header = TRUE, sep = ";")

ChangeType <- function(DF){
DF[,2] <- as.numeric(DF[,2])
DF
}


my.data <- lapply(my.data, ChangeType)

well, now my list have 4 different types of values in each dataframe, like as for 1 to 7 DF the second columns is "PRECIPITACION", for 8 to 14 DF is "RADIACION", for 15 to 21 DF the second column is "TEMPERATURA", and for 22 to 28 DF the second column is "VELOCIDAD". So i want calculate the coefficients of correlation for each group of Data Frame, any idea for this ?

Thanks for your answers

  • 1
    Do the rows line up across all data frames (i.e. does the 1st entry in data frame 1 correspond to the 1st entry in data frame 2 - 21?) – Melissa Key May 23 '18 at 21:01
  • 1
    Perhaps merge all data frames by date, and then follow this post (https://stackoverflow.com/questions/50458635/correlation-matrix-with-dplyr-tidyverse-and-broom-p-value-matrix/50458976#50458976) to calculate the correlation matrix. Without a reproducible example, it is difficult to help. – www May 23 '18 at 21:03
  • @MelissaKey hi, no, Each data frame is independent, each data frame has 36 rows – Antonio Bonilla May 23 '18 at 21:14
  • If that is the case, how exactly are you wanting to capture the correlation? – Melissa Key May 23 '18 at 21:19
  • from data frame 1 to 7 calculate the coefficients, after the data frame 8 to 14 calculate the coefficients, that is, calculate them by groups of 7 data frame each – Antonio Bonilla May 23 '18 at 21:22
  • If a column was not numeric upon data entry, then it's probably a factor. – IRTFM May 23 '18 at 22:24
  • I wonder if this may be part of your confusion:https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information – IRTFM May 23 '18 at 22:29
  • Perhaps `my.data1_7 <- lapply(seq_along(my,data)[1:7], – Chris May 24 '18 at 04:07
  • @Chris this code don´t worked – Antonio Bonilla May 24 '18 at 14:50

1 Answers1

0

Your list of files are in your my.data. files 1:7 are Precipitacion, rbind them together:

Precip <- rbind(my.data)[1:7]

do the same for Radiacion, Temperatura, Velocidad:

Radia <- rbind(my.data)[8:14]
Tempur <- rbind(my.data)[15:21]
Veloc <- rbind(my.data)[22:28]

your files are ordered Date, Precip or Date, Tempur & etc so assuming the sampling dates are same or similar, make a list of the rbind files using just the columns you need:

clima_objs <- list(Precip[,1], Precip[,2], Radia[,2], Tempur[,2], 
Veloc[,2])

then cbind() these together into a data.frame:

clima <- as.data.frame(do.call(cbind, clima_objs))

change names from $V1-$V5

 names(clima) <- c("Date", "Precipitacion", "Radiacion", 
"Temperatura", "Velocidad")

inspect:

> head(clima)
   Date Precipitacion  Radiacion Temperatura  Velocidad
1 14610     84.284294  84.284294   84.284294  84.284294
2 14641     29.583552  29.583552   29.583552  29.583552
3 14669    105.209802 105.209802  105.209802 105.209802
4 14700     96.281924  96.281924   96.281924  96.281924
5 14730      5.033855   5.033855    5.033855   5.033855
6 14761     94.065157  94.065157   94.065157  94.065157

Ok, cbind changed our date to numeric, so we change it back:

clima$Date <- as.Date.numeric(clima$Date, origin="1970-01-01")
> head(clima)
    Date Precipitacion  Radiacion Temperatura  Velocidad
1 2010-01-01     84.284294  84.284294   84.284294  84.284294
2 2010-02-01     29.583552  29.583552   29.583552  29.583552
3 2010-03-01    105.209802 105.209802  105.209802 105.209802
4 2010-04-01     96.281924  96.281924   96.281924  96.281924
5 2010-05-01      5.033855   5.033855    5.033855   5.033855
6 2010-06-01     94.065157  94.065157   94.065157  94.065157  

and now we can ask, what is correlated with what using 'cor`.

>cor(clima$Precipitacion, clima$Temperatura)
[1] 1

which is 1 because I used same data in each column after Date. Now sampling from Tempuratura

>cor(clima$Precipitacion, sample(clima$Temperatura))
[1] 0.04786067
Chris
  • 1,647
  • 1
  • 18
  • 25
  • thank´s for you answer, I don´t understand you, when create the list ´my.Hydro_cor´ for pairwise, Is posible create as: ´my.data.Hydro <- list(c(my.data[[1]],my.data[[2]], my.data[[3]],my.data[[4]], my.data[[5]],my.data[[6]], my.data[[7]]))´ – Antonio Bonilla May 25 '18 at 14:38
  • It depend on what you want to calculate, year on year is what I proposed, what does the result you want look like? And what language are you more comfortable writing in? – Chris May 25 '18 at 14:52
  • What does your data look like? `dput` would help. – Chris May 25 '18 at 15:05
  • Chris in spanish its more comfortable for my, well, my data frames has a period of 3 years (2014-2015-2016) of 01/01/2014 to 31/12/2016, in month average, so the sublist ´my.data.Hydro´ correspond to first 7 data frame of list ´my.data´ right?, so, I want calculate the coefficients of correlations of the its 7 data frame. I hope you can understand me. – Antonio Bonilla May 25 '18 at 15:13
  • how to use dput ? – Antonio Bonilla May 25 '18 at 15:14
  • `>dput(my.data.Hydro)` , then copy all the output and paste it into your answer above. Y lo que quiero determinar sea si quiera su informe anualmente o mensualment. Yo proponi ano por ano con la forma de lista c(mydata1, mydata2) et secuencia, – Chris May 25 '18 at 16:02
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171794/discussion-between-chris-and-antonio-bonilla). – Chris May 25 '18 at 16:04