0

I have 3 csv files, I have three columns in all the three files( Maths, Physics and Chemistry) and marks of all the students. I created a loop to read all the files and saved in a dataframe as follows. In every file line numbers 1,2,4,5 need to be skipped.

files <- list.files(pattern = ".csv") 

for(i in 1:length(files)){
  data <- read.csv(files[i], header=F, skip=2) # by writing skip=2 I could only skip first two lines. 
  View(data)
  mathavg[i] <- sum(as.numeric(data$math), na.rm=T)/nrow(data)
}

result <- cbind(files,mathavg)
write.csv(result,"result_mathavg.csv")

I could not able to calculate the average of math column in all the three files.

Like this I need to calculate for all the three subjects across three files. any help????

2 Answers2

1

This should work,

files  <- c("testa.csv","testb.csv","testc.csv")
list_files  <- lapply(files,read.csv,header=F,stringsAsFactors=F)

list_files  <- lapply(list_files, function(x) x[-c(1,2,4,5),])

mathav  <- sapply(list_files,function(x) mean(as.numeric(x[,2]),na.rm=T))
result  <- cbind(files,mathav)
write.csv(result,"result_mathavg.csv",row.names=F)

I didn't have access to your files, so I made up three and called them 'files'. I used the lapply function to load the files, then to remove the lines that you didn't want. I got the average using the sapply function then I went back to your code to get result, etc.

DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29
  • For math thats okay. If I want to calculate the physics column average and chemistry column avg phyav <- sapply(list_files,function(x) mean(as.numeric(x[,3]),rm.na=T)).... only change in the column number should work right? but that's not working – Kalyan Ramanuja Jan 04 '16 at 20:22
  • I'd need to see your csv file to comment. – DarrenRhodes Jan 04 '16 at 20:28
  • Student Name Math Physic Chemistry Term 1 Term 1 Term 1 Score (125) Score (125) Score (125) Abhinav.S 107.75 117.25 95.5 Abhishek.C 112.5 88.75 91 Aditya Gowrishankar 117 116.5 106 Akshara Ashok 121 111.25 111 Arsheyaa Prasanna 110.75 91.25 78.25 Arya.B 117 123.75 125 Ayaan.S.Ahammad 121 123.75 121.5 Daksha Swaminathan 109 111.75 106 Debotri Banerjee 120 118.75 NT Diya.G 106.75 83.5 87.5 – Kalyan Ramanuja Jan 04 '16 at 20:32
  • You can see the csv sample above – Kalyan Ramanuja Jan 04 '16 at 20:33
  • Post me your email id, Will connect there. If you're fine. – Kalyan Ramanuja Jan 04 '16 at 20:35
  • Try the code again, it looks like I made a typographical error (I wrote rm.na rather than na.rm). – DarrenRhodes Jan 04 '16 at 22:23
  • Changed the rm.na to na.rm , that's not the problem. cause would be the column number . Its working only for column2 not for any other column – Kalyan Ramanuja Jan 05 '16 at 06:19
  • Hi, where you have mathav <- sapply(list_files,function(x) mean(as.numeric(x[,2]),na.rm=T)), if you change the number 2 to 3 and change mathav to physav it will extract the averages for the next column. You can then change this line result <- cbind(files,mathav) to result <- cbind(files,mathav,physav) and re-run the code it will put the extra column in the result table. By a similar process you will be able to get your remaining column. If this works mark the question as answered. – DarrenRhodes Jan 05 '16 at 08:41
  • did the same. all the values are displaying as NA. I mentioned na.rm=T even then the same problem – Kalyan Ramanuja Jan 05 '16 at 08:52
  • what does list_files[,3] return? What does str(list_files) return? – DarrenRhodes Jan 05 '16 at 08:55
  • list_files <- lapply(list_files, na.strings = c("AB","NT"),function(x) x[-c(1,2,4,5),]) Error in FUN(X[[i]], ...) : unused argument (na.strings = c("AB", "NT")) > > phyav <- sapply(list_files,function(x) mean(as.numeric(x[,4]),na.rm=T)) Error in `[.data.frame`(x, , 4) : undefined columns selected Called from: (function () { .rs.breakOnError(TRUE) })() Browse[1]> mathav <- sapply(list_files,function(x) mean(as.numeric(x[,5]),na.rm=T)) Error during wrapup: undefined columns selected Browse[1]> > result <- cbind(files,phyav,mathav) write.csv(result,"result_mathavg.csv",row.names=F) – Kalyan Ramanuja Jan 05 '16 at 09:31
  • Error has been posted above – Kalyan Ramanuja Jan 05 '16 at 09:32
  • what do you get if you type str(list_files)? – DarrenRhodes Jan 05 '16 at 09:47
  • I could see the list of contents in all the three files – Kalyan Ramanuja Jan 05 '16 at 09:53
  • When I add the remaining columns its showing object not found and even for the math its showing same value for all the three files in the output. – Kalyan Ramanuja Jan 05 '16 at 13:17
0

mathavg needs to be initialized before it can be operated on with []. To remove lines 4 and 5 you just need to perform a subsetting operation after reading the data. lines 4 and 5 become 2 and 3 if you skip the first 2 lines when reading the data.

files <- list.files(pattern = ".csv") 
mathavg<-''
for(i in 1:length(files)){
  data <- read.csv(files[i], header=F, skip=2, stringsAsFactors=F) # by writing skip=2 I could only skip first two lines. 
  data<-data[-c(2,3),] 
  mathavg[i] <- mean(as.numeric(data$math), rm.NA=T) ##best to use R's builtin function to calculate the mean
}

result <- cbind(files,mathavg)
write.csv(result,"result_mathavg.csv")
emilliman5
  • 5,816
  • 3
  • 27
  • 37
  • Bear in mind that R users encourage each other to use apply functions above for loops, http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – DarrenRhodes Jan 04 '16 at 15:48
  • @emilliman5: The above code after execution showing all the values as NAN – Kalyan Ramanuja Jan 04 '16 at 20:40
  • According the snippet of data you posted above `data$math` needs to be `data$Math`. But without seeing the actual data I cannot troubleshoot any further. – emilliman5 Jan 04 '16 at 20:51