1

Setting the scene:

So I have a directory with 50 .csv files in it.

All files have unique names e.g. 1.csv 2.csv ...

The contents of each may vary in the number of rows but always have 4 columns

The column headers are:

  • Date
  • Result 1
  • Result 2
  • ID

I want them all to be merged together into one dataframe (mydf) and then I'd like to ignore any rows where there is an NA value.

So that I can count how many complete instances of an "ID" there were. By calling for example;

  • myfunc("my_files", 1)
  • myfunc("my_files", c(2,4,6))

My code so far:

myfunc <- function(directory, id = 1:50) {
        files_list <- list.files(directory, full.names=T)
        mydf <- data.frame()
        for (i in 1:50) {
                mydf <- rbind(mydf, read.csv(files_list[i]))
        }
        mydf_subset <- mydf[which(mydf[, "ID"] %in% id),]
        mydf_subna <- na.omit(mydf_subset)
        table(mydf_subna$ID)
}

My issues and where I need help:

My results come out this way

2   4    6   
200 400  600

and I'd like to transpose them to be like this. I'm not sure if calling a table is right or should I call it as.matrix perhaps?

2 100
4 400
8 600

I'd also like to have either the headers from the original files or assign new ones

ID Count
2  100
4  400
8  600

Any and all advice is welcome

Matt

Additional update

I tried amending to incorperate some of the helpful comments below, so I also have a set of code that looks like this;

myfunc <- function(directory, id = 1:50) {
        files_list <- list.files(directory, full.names=T)
        mydf <- data.frame()
        for (i in 1:50) {
                mydf <- rbind(mydf, read.csv(files_list[i]))
        }
        mydf_subset <- mydf[which(mydf[, "ID"] %in% id),]
        mydf_subna <- na.omit(mydf_subset)
        result <- data.frame(mydf_subna$ID)
        transposed_result <- t(result)
        colnames(transposed_result) <- c("ID","Count")
}

which I try to call with this:

myfunc("myfiles", 1)
myfunc("myfiles", c(2, 4, 6))

but I get this error

> myfunc("myfiles", c(2, 4, 6))
Error in `colnames<-`(`*tmp*`, value = c("ID", "Count")) : 
  length of 'dimnames' [2] not equal to array extent

I wonder if perhaps I'm not creating this data.frame correctly and should be using a cbind or not summing the rows by ID maybe?

Mchapple
  • 43
  • 2
  • 10

2 Answers2

0

Welcome to Stack Overflow.

I am assuming that the function that you have written returns the table which is saved in variable ans.

You may give a try to this code:

ans <- myfunc("my_files", c(2,4,6))

ans2 <- data.frame(ans)

colnames(ans2) <- c('ID' ,'Count')
Kunal Puri
  • 3,419
  • 1
  • 10
  • 22
  • Hi @kanal-puri , thanks for the prompt answer. I've created the function **Myfunc** which withiin it has an empty data.frame **mydf**, this gets all files content appended to it. I then subset it to only have the ID value I specify when I call it in a while, but I also omit any NA values and save this is the **mydf_subna**. It's this variable that I put into a table. Should my equivelant of your **ans2** be **mydf_subna** – Mchapple May 26 '16 at 15:27
  • It must be equivalent to the value that the function is returning e.g. `2 4 6 200 400 600`. Please see the edit. – Kunal Puri May 26 '16 at 15:29
0

You need want to change your function to create a data frame rather than a table and then transpose that data frame. Change the line

table(mydf_subna$ID)

to be instead

result <- data.frame(mydf_subna$ID) 

then use the t() function which transposes your data frame

transposed_result <- t(result) 

colnames(transposed_result) <- c("ID","Count") 
clairekelley
  • 427
  • 2
  • 4
  • 11
  • Thanks @clairekelly, everything that you said made sense, however when I ran this it came back with an error. I've just included the business end of the function for reference: `mydf_subna <- na.omit(mydf_subset)` `result <- data.frame(mydf_subna$ID)` `transposed_result <- t(result)` `colnames(transposed_result) <- c("ID","Count") ` `}` `> complete("myfunc", c(2, 4, 6))` `Error in colnames<-(*tmp*, value = c("ID", "Count")) : ` `length of dimnames [2] not equal to array extent` – Mchapple May 26 '16 at 16:09
  • My guess is that this means your data frame is the wrong shape. Can you do dim(transposed_result) and find out what shape it is? – clairekelley May 26 '16 at 18:45
  • part 1/2 - Hi @calirekelly. As this is within a function I dont seem to be able to do a dim(transposed_result) on that. I assume that is because the data.frame isn't created until when the request is called. *correction* I think I had a typo in the previous block. With the below code I am now getting `> myfunc("specdata", 1) NULL` and when I call `> dim(transposed_result) Error: object 'transposed_result' not found` – Mchapple May 27 '16 at 13:23
  • part 2/2 - `myfunc <- function(directory, id = 1:50) { files_list <- list.files(directory, full.names=T) mydf <- data.frame() for (i in 1:50) { mydf <- rbind(mydf, read.csv(files_list[i])) } mydf_subset <- mydf[which(mydf[, "ID"] %in% id),] mydf_subna <- na.omit(mydf_subset) result <- data.frame(mydf_subna$ID) transposed_result <- t(result) colnames(transposed_result) <- c("ID","Count") }` – Mchapple May 27 '16 at 13:23
  • When your'e having this kind of issue it will be much easier to debug if you run your code outside your function. Run through your code line by line to identify which line doesn't work. – clairekelley May 27 '16 at 14:36
  • Thats a good point. I'll do that and come back to this. Appreciate the comments @clairekelley – Mchapple May 27 '16 at 16:00