1

I am trying to convert the nested list, url_expansion, into a dataframe to be matched to other corresponding attributes, as a flattened table.

url_expansion contains up to 4 lists:

.Names = c("url", "topsy_expanded_url", "expanded_url", "display_url")

Ideally, each should become a column heading with NA/null applied where appropriate. So far, this worked with other data using simply:

score <- sapply(tweets, function(x) x$score)

However, as url_expansions is missing data for some rows, the following code:

url_expansions <- sapply(tweets, function(x) x$url_expansions)
display_url <- sapply(url_expansions, function(x) x$diplay_url)
data=data.frame(diplay_url)

returns the error: arguments imply differing number of rows: 4, 0, 3

I have tried many different approaches, including this, this, and this, and even this--all to no avail--and even plyr.

Reshape2 almost does it with (based on this) and @akrun (below):

    library(reshape2)
nm1 <- names(url_expansions[[1]][[1]]) 
url_expansions1 <- lapply(url_expansions, function(x) if(length(x)<1) setNames(rep(NA, 4), nm1) else x) 
data2 <- dcast(cbind(
  coln = sequence(rapply(url_expansions, length)), 
  melt(url_expansions)), L1 + L2 ~ coln, 
  value.var = "value")
data3 <- data2[-(1:2)] 
colnames(data3) <- nm1

However, sub lists with n>1 list are given new rows, which results in the new dataframe (data3) having more rows than the original url_expansions. :'(

Ultimately, I need to load each display_url row from the above into a one dataframe, alongside its associated Twitter data ala, so dimensions must match:

data=data.frame(trackback_author_name,content,highlight,display_url)

I appreciate any and all help, with this. Sample data has been included below:

list(list(structure(c("http://t.co/anl8pGqwsy", "http://twinavi.jp/topics/news/52e9a184-e618-4979-ad98-045b5546ec81?ref=tweet", 
"http://twme.jp/tnav/04h7", "twme.jp/tnav/04h7"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/vx32EOwyRI", "http://wirelesswire.jp/london_wave/201401310211.html", 
    "http://wirelesswire.jp/london_wave/201401310211.html", "wirelesswire.jp/london_wave/20…"
    ), .Names = c("url", "topsy_expanded_url", "expanded_url", 
    "display_url"))), list(structure(c("http://t.co/4trgO3HVmv", 
"http://www.asahi.com/articles/ASG102VZWG10UTIL003.html", "http://t.asahi.com/dudj", 
"t.asahi.com/dudj"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/5hnEwO5V1h", 
"http://twinavi.jp/topics/news/52e9e820-9034-4edb-9b2c-195b5546ec81?ref=tweet", 
"http://twme.jp/tnav/04hL", "twme.jp/tnav/04hL"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/GdMMXKsbY0", "http://www.riken.jp/pr/press/2014/20140130_1/", 
    "http://www.riken.jp/pr/press/2014/20140130_1/", "riken.jp/pr/press/2014/…"
    ), .Names = c("url", "topsy_expanded_url", "expanded_url", 
    "display_url"))), list(structure(c("http://t.co/7x21RTkgke", 
"http://www.asahi.com/articles/ASG1Z0PGCG1YPLBJ00W.html", "http://t.asahi.com/dtxd", 
"t.asahi.com/dtxd"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/Rcdl4L2zP1", 
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1nv8CdM", 
"bit.ly/1nv8CdM"), .Names = c("url", "topsy_expanded_url", "expanded_url", 
"display_url"))), list(structure(c("http://t.co/3E2HD1wylC", 
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html", 
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html", 
"nikkansports.com/general/news/p…"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
    c("", "", "", "")), list(structure(c("http://t.co/bIciCF7fJb", 
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363=", 
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363", 
"dailynews.yahoo.co.jp/photograph/pic…"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    c("", "", "", "")), list(structure(c("http://t.co/dwQVkHlT3R", 
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://www.cdb.riken.jp/crp/news2014.1.31_2.html", 
"cdb.riken.jp/crp/news2014.1…"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/HgtgZJID2w", 
"http://www3.nhk.or.jp/news/html/20140130/k10014894611000.html", 
"http://nhk.jp/N4Bg6FTZ", "nhk.jp/N4Bg6FTZ"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/R4dz0XI9ci", "http://pbs.twimg.com/media/BczUK5dIgAA4mDl.jpg", 
    "http://twitter.com/kokossu07/status/417942149267984384/photo/1", 
    "pic.twitter.com/R4dz0XI9ci"), .Names = c("url", "topsy_expanded_url", 
    "expanded_url", "display_url"))), list(structure(c("http://t.co/gP0bI68UEq", 
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1iTvtiy", 
"bit.ly/1iTvtiy"), .Names = c("url", "topsy_expanded_url", "expanded_url", 
"display_url"))), list(c("", "", "", "")), list(structure(c("http://t.co/2X4PnkCWxo", 
"http://dailynews.yahoo.co.jp/fc/science/stap_cells/?id=6105570", 
"http://yahoo.jp/JDsgEr", "yahoo.jp/JDsgEr"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/20SqWMJFDG", "http://mainichi.jp/feature/news/20140130mog00m040009000c.html", 
    "http://goo.gl/xRRcCl", "goo.gl/xRRcCl"), .Names = c("url", 
    "topsy_expanded_url", "expanded_url", "display_url"))), list(
    c("", "", "", "")), list(c("", "", "", "")), list(c("", "", 
"", "")), list(structure(c("http://t.co/ey2KK8wKoC", "http://www.cdb.riken.jp/crp/index.html", 
"http://www.cdb.riken.jp/crp/index.html", "cdb.riken.jp/crp/index.html"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
)), structure(c("http://t.co/7Dg7O4coDM", "http://azukichi.net/frame2/b-frame526.html", 
"http://azukichi.net/frame2/b-frame526.html", "azukichi.net/frame2/b-frame…"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
))), list(structure(c("http://t.co/6Yl1UG459s", "http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html", 
"http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html", 
"sp.mainichi.jp/select/news/20…"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/MPbamQCCpq", 
"http://www.cdb.riken.jp/crp/index.html", "http://www.cdb.riken.jp/crp/index.html", 
"cdb.riken.jp/crp/index.html"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/JkdfeQFi5C", 
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm", 
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm", 
"sankei.jp.msn.com/science/news/1…"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
    c("", "", "", "")), list(structure(c("http://t.co/Gf16StDW4d", 
"http://www.yomiuri.co.jp/science/news/20140130-OYT1T00630.htm", 
"http://bit.ly/1n11fHM", "bit.ly/1n11fHM"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))), list(
    structure(c("http://t.co/gRKf2GkPpK", "http://nosumi.exblog.jp/20296694/", 
    "http://htn.to/4M3wsg", "htn.to/4M3wsg"), .Names = c("url", 
    "topsy_expanded_url", "expanded_url", "display_url"))), list(
    c("", "", "", "")), list(structure(c("http://t.co/tgelOtTBg3", 
"http://pbs.twimg.com/media/BfLvREpCQAANS8r.jpg", "http://twitter.com/ysmkwa/status/428667991308259329/photo/1", 
"pic.twitter.com/tgelOtTBg3"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(structure(c("http://t.co/7pXgNSmGx5", 
"http://nosumi.exblog.jp/20296694/", "http://nosumi.exblog.jp/20296694/", 
"nosumi.exblog.jp/20296694/"), .Names = c("url", "topsy_expanded_url", 
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
    c("", "", "", "")), list(c("", "", "", "")), list(structure(c("http://t.co/X7I8DPjhi2", 
"http://horikawad.hatenadiary.com/entry/2014/01/30/071830", "http://horikawad.hatenadiary.com/entry/2014/01/30/071830", 
"horikawad.hatenadiary.com/entry/2014/01/…"), .Names = c("url", 
"topsy_expanded_url", "expanded_url", "display_url"))) 
Community
  • 1
  • 1
  • Please double check you dput. Something is missing – talat Jan 17 '15 at 08:00
  • Thanks for pointing that out. The original list is 1000 rows. I hope the edited data works for you now. – Mach5RacerGoGo Jan 17 '15 at 08:06
  • Regarding the column names using `dcast`, from the `dput`, you have only a single column with name `url` and there are some empty lists. The `dcast` gives you column names as `1,2,3,4`. Not sure about the expected result. – akrun Jan 17 '15 at 08:22
  • You have two many `)` at the end. `do.call(rbind,unlist(mylist[vapply(mylist,length,1L)>0],recursive=FALSE))` seems to work for the object you provided. – nicola Jan 17 '15 at 09:00
  • @akrun Thanks for replies. I expect to have 4 columns, according to: `.Names=c("url", "topsy_expanded_url", "expanded_url", "display_url")`. In addition, I would like the empty lists to be in the output, as a blank row. – Mach5RacerGoGo Jan 17 '15 at 09:02
  • @Mach5RacerGoGo You could still use the dcast code with some modifications. `nm1 <- names(url_expansions[[1]][[1]]); url_expansions1 <- lapply(url_expansions, function(x) if(length(x)<1) setNames(rep(NA, 4), nm1) else x); data2 <- dcast(...); data3 <- data2[-(1:2)]; colnames(data3) <- nm1` – akrun Jan 17 '15 at 09:36
  • @akrun Thanks for the suggestions. That also works, but suffers from the same issues: the output has more rows than the input. On closer inspection, I noticed some sub-lists are actually lists of 2 (i.e., multiple URLs were included in a Tweet, which is not unheard of). Perhaps this is the culprit? If so, maybe we might have to collapse each list into one column? This was perhaps the problem all along... :( I have updated the data above, to include some of the offending lists. Appreciate all the kind help. – Mach5RacerGoGo Jan 18 '15 at 11:38

1 Answers1

2

Expanding my comment to include blank row, I suggest the following, assuming mylist is the object:

 mylist[vapply(mylist,length,1L)==0]<-list(list(rep("",4)))
 x<-do.call(rbind,unlist(mylist,recursive=FALSE))
 colnames(x)<-names(mylist[[c(1,1)]])
nicola
  • 24,005
  • 3
  • 35
  • 56
  • Thanks for this. This is 98% there, except that it produces extra rows. If it helps, the following warning was given: `x<-do.call(rbind,unlist(url_expansions,recursive=FALSE)) Warning message: In (function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 991)` – Mach5RacerGoGo Jan 17 '15 at 10:55