0

Good morning. Very new to programming and R. I'm working on a research project and my partner (who is also new at programming) is gathering data with php and creating tables and I need to take each table and format it and create a new file with the date of the data obs in the name.

The tables have a fixed number of columns, but a random number of rows. I did the first directory of tables manually, which wasn't too bad, but I'd like to learn how to do it in R. Below is the code I'm using to format the tables.

setwd("/dir1")

    botg<-read.csv(file = "6_30_botg.csv", header = TRUE, sep = ",", colClasses="character")

##read directory of files in
##assign each table to new variable

##create a botg <- function() to breakdown each table and report new table

    botg1<-botg[,c(3, 4, 5, 6)]

    botg1$price <- str_replace_all(string=botg1$price, pattern="\\$", replacement="")
    botg1[, 5:9] <- colsplit(botg1$price, pattern=",", c("Price1", "Price2", "Price3", "Price4", "Price5"))

    botg1$shipping <- str_replace_all(string=botg1$shipping, pattern="\\$", replacement="")
    botg1$shipping <- str_replace_all(string=botg1$shipping, pattern="Shipping:", replacement="")
    botg1[, 10:14] <- colsplit(botg1$shipping, pattern=",", c("Shipping1", "Shipping2", "Shipping3", "Shipping4", "Shipping5"))

    botg1[, 15:19] <- colsplit(botg1$quantity, pattern=",", c("Quantity1", "Quantity2", "Quantity3", "Quantity4", "Quantity5"))

        botg1$Quantity1 <- as.numeric(botg1$Quantity1)
        botg1$Quantity2 <- as.numeric(botg1$Quantity2)
        botg1$Quantity3 <- as.numeric(botg1$Quantity3)
        botg1$Quantity4 <- as.numeric(botg1$Quantity4)
        botg1$Quantity5 <- as.numeric(botg1$Quantity5)

            botg1$price <- NULL
            botg1$shipping <- NULL  ##removes original columns from table
            botg1$quantity <- NULL

        setwd("/dir2")
        write.csv(botg1, file="2014-06-30.csv") ##need to automate write.csv

My immediate roadblock is reading in the directory of csv and assigning each table to a new variable. Any suggestions or hints would be awesome!

efridge
  • 85
  • 1
  • 6
  • Can you add some example data to work with? Also, it is worth noting that you might be able to take care of some the the column classes upon reading in the file via the `colClasses` argument to `read.csv`. That way you could avoid all those `as.numeric` statements. – Jota Jul 07 '14 at 15:48
  • Sure, but I don't know what's a generally accepted way to post data and what format it should be in. I have a directory with about 180 tables. – efridge Jul 07 '14 at 15:54
  • Also, I tried colClasses, but I found it was easier for me to read everything in as character, so i can strip the '$' and words from the fields that only require numbers. – efridge Jul 07 '14 at 15:55
  • See [How to make a great r reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info on posting examples. One way for you would be to find a small table, read it into R, `dput` it (i.e. `dput(smalltable)`), and copy and paste that output here. Anyway, your issue with reading in all the files from a directory can likely be solved using some form of `listofiles <- lapply(list.files(), read.csv, header = TRUE, sep = ",", colClasses="character")` – Jota Jul 07 '14 at 16:03
  • [Here is a relevant post](http://stackoverflow.com/questions/19314215/read-a-list-of-files-apply-function-and-rewrite-with-same-name-in-r) on reading in a list of files, applying a function, and rewriting to a file, but in that case they rewrite using the original file names. – Jota Jul 07 '14 at 16:07
  • I like the for loop idea, but I'm having difficulty applying my code to each table that's being read in. I need to assign each table as it's being read in to a variable that I can modify. I'm currently researching the foreach package for a possible solution. – efridge Jul 08 '14 at 16:18
  • I haven't used `foreach` and I can almost guarantee you can do without it, though it'd be neat to see you solve the problem using it and it could be faster than other approaches. Like I said previously, providing some sample data goes a long way toward getting help. It makes it easier for people to help you if you provide some data that others can play with... Additionally, you can certainly read in each data frame to a new object _if you want_, but it is likely a better idea to create a list that stores each data frame. – Jota Jul 08 '14 at 18:38
  • I will revisit `foreach` later. I got the `for` loop you recommended to work and am now struggling to do yet another simple task. ;) Indeed, I need to create data sets with my examples to make it easier for folks to interact with the problem. – efridge Jul 08 '14 at 19:39

0 Answers0