2

Background

I'm querying a mongodb database to find a document:

library(rmongodb)
...
res <- mongo.find.one(m, n, q, f)  # <~~ returns BSON 
res <- mongo.bson.to.list(res)     # <~~ converts BSON to list

I'm then using this answer to try and convert it to a data frame

df <- as.data.frame(t(sapply(res[[1]], '[', seq(max(sapply(res[[1]],length))))))

However, this gives me a data frame of lists (subsetted here for convenience):

data

> dput(df)
structure(list(horse_id = list(17643L, 4997L, 20047L, 9914L, 
17086L, 12462L, 18490L, 17642L, 26545L, 27603L, 14635L, 13811L, 
27719L, 31585L, 9644L), start_1400m = list(14.76, 14.3, 14.48, 
15.11, 14.65, 14.63, 14.85, 14.54, 14.93, 14.5, 14.78, NULL, 
NULL, NULL, NULL), `1400m_1200m` = list(12.96, 12.47, 12.47, 
13.02, 12.65, 12.92, 13.11, 12.37, 13, 12.84, 12.79, NULL, 
NULL, NULL, NULL)), .Names = c("horse_id", "start_1400m", 
"1400m_1200m"), row.names = c(NA, 15L), class = "data.frame")

> head(df)
    horse_id start_1400m 1400m_1200m
1    17643       14.76       12.96
2     4997        14.3       12.47
3    20047       14.48       12.47
4     9914       15.11       13.02
5    17086       14.65       12.65
6    12462       14.63       12.92

Issue

I would like to library(reshape2); melt and then plot this data using ggplot2, but as expected I can't melt data.frames with non-atomic columns.

> melt(df, id.vars=c("horse_id"))
Error: Can't melt data.frames with non-atomic 'measure' columns

How can I convert this data to a 'standard' data frame (i.e. not a data frame of lists?), or melt it as is?

Update

I hadn't properly considered the NULLs in the data. Using a combination of a comment in this question - replacing NULL with NA and this answer - Convert List to DF with NULLs I came up with

d <- as.data.frame(do.call("rbind", df))

library(plyr)
d <- rbind.fill(lapply(d, function(f) {
  data.frame(Filter(Negate(is.null), f))
}))

names(d) <- sub("X","",names(d))      #<~~ clean the names
d <- melt(d, id.vars=c("horse_id"))   #<~~ melt for use in ggplot2

This replaces the NULLs with NAs and allows me to melt the data. However, I'm still not fully au fait with what each step is doing yet, or whether this is the right approach.

Community
  • 1
  • 1
tospig
  • 7,762
  • 14
  • 40
  • 79
  • 1
    What is the desired output? What about `melt(lapply(df, unlist))`? – A5C1D2H2I1M1N2O1R2T1 Mar 22 '15 at 11:06
  • Definitely do not use sapply since that tries to convert its output to a list. According to comment in your reference, it would be best to use vapply but I would settle for apply and wrap it with unlist to make sure the output is not a list. I may be necessary to move unlist a level deeper if you still get an X of lists. –  Mar 22 '15 at 11:11
  • @AnandaMahto - thanks for the suggestion, but that method loses the relationship between the `horse_id` and the `start_1400m`, `1400m_1200m` values (even though some are `NULL`), does it not? – tospig Mar 23 '15 at 01:40

1 Answers1

-1

It is normal for data.frames created from vectors or lists to have those objects represented as lists in dput() output and that is not usually a problem because it still works as a data.frame.

For example:

> a = list(1, 2, 3)
> b = list(4, 5, 6)
> df = data.frame(a)
> df = rbind(b, df)
> df
   X1 X2 X3
1   4  5  6
2   1  2  3
> s = sum(df[,2])
> s
[1] 7
> str(df)
'data.frame':   2 obs. of  3 variables:
 $ X1: num  4 1
 $ X2: num  5 2
 $ X3: num  6 3
> dput(df)
structure(list(X1 = c(4, 1), X2 = c(5, 2), X3 = c(6, 3)), .Names = c("X1", 
"X2", "X3"), row.names = 1:2, class = "data.frame")
> 
  • Did you actually try `str(df)` on the data provided by the OP. It is completely different from yours. – David Arenburg Mar 22 '15 at 11:49
  • The OP - I guess you mean tospig - did not provide data to test. However, tospig showed 'mongo.bson.to.list(res)' that indicates capability to convert whatever it is into a list. Provided that is done there is no reason it should not work as I demonstrated. However, if not then it would need to be converted into the right kind of list and that should be a simple matter of splitting it correctly, –  Mar 22 '15 at 12:00
  • The OP *did* provide the data set. Try reading the question. @Anandas solution will solve this issue taking the data *as is*. – David Arenburg Mar 22 '15 at 12:01