0

I have a data frame called MetricsInput which looks like this:

ID  ExtractName     Dimensions  Metrics     First_Ind
124 extract1.txt    ga:date     gs:sessions 1
128 extract1.txt    ga:date     gs:sessions 0
134 extract1.txt    ga:date     gs:sessions 0
124 extract2.txt    ga:browser  ga:users    1
128 extract2.txt    ga:browser  ga:users    0
134 extract2.txt    ga:browser  ga:users    0

I'm trying to use the above data frame in a loop to run a series of queries, which ultimately will create 2 text files, extract1.txt and extract2.txt. The reason I have the first_ind field is I only want to append the column headings on the first run through each unique file.

Here's my loop -- the issue I'm having is that the data for each ID is not appending -- I seem to be overwriting my results, not appending. Where did I go wrong?

for(i in seq(from=1, to=nrow(MetricsInput), by=1)){
  id <- MetricsInput[i,1]
  myresults <- ga$getData(id,batch = TRUE, start.date="2013-12-01", end.date="2014-01-01", metrics = MetricsInput[i,4], dimensions = MetricsInput[i,3])

  appendcolheads <- ifelse(MetricsInput[i,5]==1, TRUE, FALSE)

  write.table(myresults, file=MetricsInput$ExtractName[i], append=TRUE, row.names = FALSE, col.names = appendcolheads, sep="\t")
}
davids12
  • 323
  • 5
  • 18
  • 1
    What's with `file=file=`, should just need one `file=`. But ince most of these variables/functions are not defined within your sample code, your problem is not [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so it's very difficult to help you. – MrFlick Oct 09 '14 at 15:14
  • Sorry, typo on my part when pasting. – davids12 Oct 09 '14 at 15:18
  • One of the errors I'm getting is this: Error in if (file == "") file <- stdout() else if (is.character(file)) { : missing value where TRUE/FALSE needed – davids12 Oct 09 '14 at 15:20
  • Where is `Extracts$ExtractName` defined? What is it? Is it different from `MetricsInput$ExtractName`? – MrFlick Oct 09 '14 at 15:20
  • Sorry I'm a mess today -- they're the same thing. Extracts$ExtractName should be MetricsInput$ExtractName – davids12 Oct 09 '14 at 15:24
  • If you provide a reproducible example and show us the desired output, all these problems will go away. – Roman Luštrik Oct 10 '14 at 11:26

2 Answers2

1

Although you can get this code to work, it doesn't look like the right approach at all. As @MrFlick said in the comments it's very hard to help without being able to reproduce your problem, but I would do something along the following lines

GetData <- function(id, metric, dim) {
    d <- ga$getData(id, batch = TRUE, start.date="2013-12-01",
             end.date="2014-01-01", metrics = metric, dimensions = dim)
    d$id <- id
    d
}

myresults <- Map(GetData, 
                   id = MetricsInput$ID,
                   metric = MetricsInput$Metrics,
                   dim = MetricsInput$Dimensions)

This will give you a list whose ith component is the output of the ith iteration in your for loop. So now you have to split it in two to write it in the files you wanted

myresultslist <- split(myresults, MetricsInput$ExtractName)
myresultslist <- lapply(myresultslist, do.call, what = rbind)

Map(write.table, x = myresultslist, file = names(myresultslist), 
    row.names = FALSE, sep = "\t")
konvas
  • 14,126
  • 2
  • 40
  • 46
  • I like the approach but I'm getting an error when running: myresultsdf <- do.call(rbind, myresults), the error I'm getting is Error in match.names(clabs, names(xi)) : names do not match previous names – davids12 Oct 09 '14 at 19:11
  • This means that the data frames returned by ga$getData do not always have the same column names. I have changed the code a bit, so the splitting takes place before rbinding. So if ga$getData returns same column names for inputs corresponding to the same filename it should work now, otherwise I can't help much without being able to reproduce... – konvas Oct 10 '14 at 11:13
  • Thanks so much konvas! Works great -- the only thing I would ask you is how can I get the id that I'm passing to be included on the myresultslist? – davids12 Oct 10 '14 at 13:41
  • You mean you want the output of `GetData()` to include an extra column which equals the id you supply in the call? That can be done by adding a column to the result within `GetData()` - see edit – konvas Oct 10 '14 at 13:52
  • Hm, the change you made now just includes only the "id" column in the output .txt files, not any of the other metrics and dimensions. – davids12 Oct 10 '14 at 14:10
  • Ah sorry I was careless and forgot to tell the function to return `d`. If you don't specify that it just returns the result of the last call, which was the id in this case... fixed now – konvas Oct 10 '14 at 14:12
0

Why don't you create a data frame in the loop and then write it to the text file?

myresults <- data.frame()
for (i in yourloop) {
  #your code here
  id <- MetricsInput[i,1]
  temp <- ga$getData(id,batch = TRUE, start.date="2013-12-01", end.date="2014-01-01", metrics = MetricsInput[i,4], dimensions = MetricsInput[i,3])

  myresults <- rbind(myresults, temp)
}

write.csv(myresults, ...)
americo
  • 1,013
  • 8
  • 17