I am trying to convert the nested list, url_expansion
, into a dataframe to be matched to other corresponding attributes, as a flattened table.
url_expansion
contains up to 4 lists:
.Names = c("url", "topsy_expanded_url", "expanded_url", "display_url")
Ideally, each should become a column heading with NA/null applied where appropriate. So far, this worked with other data using simply:
score <- sapply(tweets, function(x) x$score)
However, as url_expansions
is missing data for some rows, the following code:
url_expansions <- sapply(tweets, function(x) x$url_expansions)
display_url <- sapply(url_expansions, function(x) x$diplay_url)
data=data.frame(diplay_url)
returns the error: arguments imply differing number of rows: 4, 0, 3
I have tried many different approaches, including this, this, and this, and even this--all to no avail--and even plyr.
Reshape2 almost does it with (based on this) and @akrun (below):
library(reshape2)
nm1 <- names(url_expansions[[1]][[1]])
url_expansions1 <- lapply(url_expansions, function(x) if(length(x)<1) setNames(rep(NA, 4), nm1) else x)
data2 <- dcast(cbind(
coln = sequence(rapply(url_expansions, length)),
melt(url_expansions)), L1 + L2 ~ coln,
value.var = "value")
data3 <- data2[-(1:2)]
colnames(data3) <- nm1
However, sub lists with n>1 list are given new rows, which results in the new dataframe (data3
) having more rows than the original url_expansions
. :'(
Ultimately, I need to load each display_url
row from the above into a one dataframe, alongside its associated Twitter data ala, so dimensions must match:
data=data.frame(trackback_author_name,content,highlight,display_url)
I appreciate any and all help, with this. Sample data has been included below:
list(list(structure(c("http://t.co/anl8pGqwsy", "http://twinavi.jp/topics/news/52e9a184-e618-4979-ad98-045b5546ec81?ref=tweet",
"http://twme.jp/tnav/04h7", "twme.jp/tnav/04h7"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/vx32EOwyRI", "http://wirelesswire.jp/london_wave/201401310211.html",
"http://wirelesswire.jp/london_wave/201401310211.html", "wirelesswire.jp/london_wave/20…"
), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/4trgO3HVmv",
"http://www.asahi.com/articles/ASG102VZWG10UTIL003.html", "http://t.asahi.com/dudj",
"t.asahi.com/dudj"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/5hnEwO5V1h",
"http://twinavi.jp/topics/news/52e9e820-9034-4edb-9b2c-195b5546ec81?ref=tweet",
"http://twme.jp/tnav/04hL", "twme.jp/tnav/04hL"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/GdMMXKsbY0", "http://www.riken.jp/pr/press/2014/20140130_1/",
"http://www.riken.jp/pr/press/2014/20140130_1/", "riken.jp/pr/press/2014/…"
), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/7x21RTkgke",
"http://www.asahi.com/articles/ASG1Z0PGCG1YPLBJ00W.html", "http://t.asahi.com/dtxd",
"t.asahi.com/dtxd"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/Rcdl4L2zP1",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1nv8CdM",
"bit.ly/1nv8CdM"), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(structure(c("http://t.co/3E2HD1wylC",
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html",
"http://www.nikkansports.com/general/news/p-gn-tp0-20140131-1251192.html",
"nikkansports.com/general/news/p…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(structure(c("http://t.co/bIciCF7fJb",
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363=",
"http://dailynews.yahoo.co.jp/photograph/pickup/?1391051363",
"dailynews.yahoo.co.jp/photograph/pic…"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(structure(c("http://t.co/dwQVkHlT3R",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://www.cdb.riken.jp/crp/news2014.1.31_2.html",
"cdb.riken.jp/crp/news2014.1…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/HgtgZJID2w",
"http://www3.nhk.or.jp/news/html/20140130/k10014894611000.html",
"http://nhk.jp/N4Bg6FTZ", "nhk.jp/N4Bg6FTZ"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/R4dz0XI9ci", "http://pbs.twimg.com/media/BczUK5dIgAA4mDl.jpg",
"http://twitter.com/kokossu07/status/417942149267984384/photo/1",
"pic.twitter.com/R4dz0XI9ci"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/gP0bI68UEq",
"http://www.cdb.riken.jp/crp/news2014.1.31_2.html", "http://bit.ly/1iTvtiy",
"bit.ly/1iTvtiy"), .Names = c("url", "topsy_expanded_url", "expanded_url",
"display_url"))), list(c("", "", "", "")), list(structure(c("http://t.co/2X4PnkCWxo",
"http://dailynews.yahoo.co.jp/fc/science/stap_cells/?id=6105570",
"http://yahoo.jp/JDsgEr", "yahoo.jp/JDsgEr"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/20SqWMJFDG", "http://mainichi.jp/feature/news/20140130mog00m040009000c.html",
"http://goo.gl/xRRcCl", "goo.gl/xRRcCl"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(c("", "", "", "")), list(c("", "",
"", "")), list(structure(c("http://t.co/ey2KK8wKoC", "http://www.cdb.riken.jp/crp/index.html",
"http://www.cdb.riken.jp/crp/index.html", "cdb.riken.jp/crp/index.html"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
)), structure(c("http://t.co/7Dg7O4coDM", "http://azukichi.net/frame2/b-frame526.html",
"http://azukichi.net/frame2/b-frame526.html", "azukichi.net/frame2/b-frame…"
), .Names = c("url", "topsy_expanded_url", "expanded_url", "display_url"
))), list(structure(c("http://t.co/6Yl1UG459s", "http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html",
"http://sp.mainichi.jp/select/news/20140130k0000m040096000c.html",
"sp.mainichi.jp/select/news/20…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/MPbamQCCpq",
"http://www.cdb.riken.jp/crp/index.html", "http://www.cdb.riken.jp/crp/index.html",
"cdb.riken.jp/crp/index.html"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/JkdfeQFi5C",
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm",
"http://sankei.jp.msn.com/science/news/140129/scn14012921250003-n1.htm",
"sankei.jp.msn.com/science/news/1…"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(structure(c("http://t.co/Gf16StDW4d",
"http://www.yomiuri.co.jp/science/news/20140130-OYT1T00630.htm",
"http://bit.ly/1n11fHM", "bit.ly/1n11fHM"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
structure(c("http://t.co/gRKf2GkPpK", "http://nosumi.exblog.jp/20296694/",
"http://htn.to/4M3wsg", "htn.to/4M3wsg"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url"))), list(
c("", "", "", "")), list(structure(c("http://t.co/tgelOtTBg3",
"http://pbs.twimg.com/media/BfLvREpCQAANS8r.jpg", "http://twitter.com/ysmkwa/status/428667991308259329/photo/1",
"pic.twitter.com/tgelOtTBg3"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(structure(c("http://t.co/7pXgNSmGx5",
"http://nosumi.exblog.jp/20296694/", "http://nosumi.exblog.jp/20296694/",
"nosumi.exblog.jp/20296694/"), .Names = c("url", "topsy_expanded_url",
"expanded_url", "display_url"))), list(c("", "", "", "")), list(
c("", "", "", "")), list(c("", "", "", "")), list(structure(c("http://t.co/X7I8DPjhi2",
"http://horikawad.hatenadiary.com/entry/2014/01/30/071830", "http://horikawad.hatenadiary.com/entry/2014/01/30/071830",
"horikawad.hatenadiary.com/entry/2014/01/…"), .Names = c("url",
"topsy_expanded_url", "expanded_url", "display_url")))