0

I am having trouble understanding lapply with read_csv function. The question is if Lapply creates an array of dataframes where I can access each dataframe using data[i]?

What I did:

I have downloaded the 5 cities data set (found here: https://archive.ics.uci.edu/ml/machine-learning-databases/00394/FiveCitiePMData.rar) and wrote R code to extract the 5 csv files and save to a dataframe as follows:

cities <- list.files('FiveCities')
cities_df <- lapply(cities, read.csv)

My goal was to create a workbook and save each of the csv files into an xlsx file with each csv being a sheet in the workbook as follows:

wb <- createWorkbook()
for(i in 1:length(cities)){
    sheet <- addWorksheet(wb , i)
    writeData(wb, sheet, cities_df[i])
}

What I am confused on is accessing each csv like this cities_df[i]. I thought cities_df[i] accesses the ith row of the dataframe and not a separate dataframe as a whole. Does lapply create an array of dataframes called cities_df[i] or what happens? If it does create an array then how come I can simply call cities_df and receive a result without specifying which dataframe in the array to call?

Vince
  • 119
  • 2
  • 8
  • 3
    Try `cities_df[[i]]` (double square brackets) – Allan Cameron Oct 18 '20 at 17:05
  • It seems to return the same as if I write cities_df[1] (when compared to cities_df[[1]]) – Vince Oct 18 '20 at 17:14
  • 2
    But it doesn't. [Difference between `[` and `[[`](https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el) – Rui Barradas Oct 18 '20 at 17:16
  • The following screenshot shows the output of my comment https://imgur.com/a/UqpbVtf So the top part retrieves the first dataframe as a whole and the second is accessing that dataframe? – Vince Oct 18 '20 at 17:23
  • 3
    `lapply` returns a list. In your case `cities_df` is a list with 5 elements, where each element is a dataframe. Calling `cities_df[[i]]` returns the i-th dataframe, while `cities_df[i]` returns a list of length 1 where the one element contains the i-th df. You can verify this by calling `str(cities_df[i])` and `cities_df[[i]]`. – stefan Oct 18 '20 at 17:36
  • 2
    `[` extracts a sub-list, `[[` extracts a list member. If that list member is a data.frame, then yes, the result is accessing the df. The former never is. – Rui Barradas Oct 18 '20 at 17:36
  • @stefan thank you for the answer, that makes sense now! – Vince Oct 18 '20 at 17:39

2 Answers2

0

Here is complete code to create the Excel workbook and save it to a file FiveCities/cities.xlsx.

cities <- list.files('FiveCities', full.names = TRUE)
cities_df <- lapply(cities, read.csv)
names(cities_df) <- sub("\\.csv", "", basename(cities))

wb <- createWorkbook()
for(i in names(cities_df)){
  sheet <- addWorksheet(wb , i)
  writeData(wb, i, cities_df[[i]])
}
saveWorkbook(wb, file = "FiveCities/cities.xlsx")
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

This code may help!

library(plyr)
library(readr)
library(tidyverse)
library(openxlsx)
mydir = "C:/Users/mouad/Desktop/assasins creed/new"
myfiles = list.files(path=mydir, pattern="*.csv", full.names=TRUE)
str_length(mydir)
 
mylist=lapply(1:5, function(j) read_csv(myfiles[[j]]))
setwd(mydir)
wb <- createWorkbook()
lapply(1:length(mylist), function(i){
  addWorksheet(wb=wb, sheetName = substr(myfiles[i],str_length(mydir)+1,60))
  writeData(wb, sheet = i, mylist[[i]][length(mylist[[i]])])
})
saveWorkbook(wb, "test.xlsx", overwrite = TRUE)    
 read.xlsx("test.xlsx", sheet = 1)    
Tou Mou
  • 1,270
  • 5
  • 16