Unnest() R does not work for large data sample

Question

I am unnesting data from a JSON file. When I make a small sample, the unnest() function works, but when I try to run it on the large, original dataframe I get the error below.

`Error in bind_rows_(x, .id) : 
  Column lines can't be converted from integer to list`

My code below. We got JSON data from GitHub's API.

`repo_data <- fromJSON("data/data/repos.json")`

Small data frames, only first 100 rows

`repo_small <- head(repo_data, 100)`

tidy repo data, unlist languages and lines of code

`df_repo <- repo_small %>% select(ownerName, name, languages, ownerType) %>% unnest()`

There were no NA rows when I filtered or any other strange things. The only column I need to unnest is languages.

Languages is a list that contains 2 lists. The first list is name and has values like "Java", "Python", and "Ruby". These are character values. The second list is lines and has values like 104, 109432, and 10. These are integer values.

As requested some sample code to replicate the data. testdf would be the data frame and language the column in question.

`owner <- c("github", "palentir", "apple")
gitcode <- data.frame(name = c("java"), lines=c(81))
palentircode <- data.frame(name= c("java", "python", "R"), lines=c(200, 45,903))
applecode <- data.frame(name=c("java", "ruby"), lines=c(12, 120))
langauge <- list(gitcode, palentircode, applecode)
testdf <- data.frame(owner)
testdf$language <- langauge`

dataframe with languages

Please provide a sample of that database (for a [minimal, complete, verifiable example](https://stackoverflow.com/help/mcve)). Some methods can be found https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — r2evans, Mar 07 '18 at 22:41
`unnest(testdf)` runs with warnings. Can you provide an example that reproduces your error? — De Novo, Mar 08 '18 at 03:47
Please read the post we're all linking to :) https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — De Novo, Mar 08 '18 at 04:11
I tried, but the data frame isn't in the right shape. The warnings are the same error message. What does the message mean when I am not binding anything unless unnest is? — Blurbz, Mar 08 '18 at 05:42
I appreciate the help, but have decided to see if I can just handcode a function, since it seems that unnest() is working on the small sample, but seems to bring this error when using the large, original data set. — Blurbz, Mar 08 '18 at 05:48
But would like to see and understand why this error is happening, or what it means. I provided a sample of the code to show the data structure, and that's the best I can do. And yes, I have read all the available documentation I found, thank you very much. — Blurbz, Mar 08 '18 at 05:52
I'm glad you found a solution! I agree, it's very frustrating when you run into an error you can't reproduce! I hope you understand why we were asking for a dataset that would reproduce it. — De Novo, Mar 08 '18 at 06:36

score 2 · Answer 1 · answered Mar 07 '18 at 23:24

From the documentation of unnest()

unnest() can handle list-columns that can atomic vectors, lists, or data frames (but not a mixture of the different types).

You have two different atomic types in your list. I don't know if this is the structure of your data or not, without a reproducible example as requested in the comments, but this illustrates the requirement of unnest()

DF <- data.frame(a = 1:2)
DF$name <- list(c("Java", "Python", "Ruby"), c(104L, 109432L, 10L))
unnest(DF, name)
# will fail because of the requirements of unnest

If this is the problem, you'll have to convert the second element of the list to character first.

D$name[[2]] <- as.character(DF$name[[2]])
unnest(DF, name)
#   a   name
# 1 1   Java
# 2 1 Python
# 3 1   Ruby
# 4 2    104
# 5 2 109432
# 6 2     10

Yes, I read the documentation. But unnest still works, although with warnings. From running the sample data frame, I got this warning, but it was able to unnest successfully. `In bind_rows_(x, .id) : binding character and factor vector, coercing into character vector` — Blurbz, Mar 08 '18 at 00:16

Unnest() R does not work for large data sample

1 Answers1