It's been some time since I've worked in R, so I'm a little rusty and need some help with lists. I have a list which contains 7 elements that pertain to patrons' purchases while visiting a large chain store (a snippet of the list is provided below). Within a given index, elements 1, 2, and 7 are always vectors of length 1 and elements 4, 5, and 6 are always vectors of the same length, but vary from list-index to list index (e.g. from [[74]] to [[75]]). This is because elements 1, 2, and 7 are about a patron's visit to a store, whereas elements 4, 5, and 6 are about their individual purchases during that visit, so there is a one to many relationship between elements 1, 2, 7 (visit) and 3, 4, and 5 (purchases). I am trying to figure out how to efficiently convert my list to a single data frame. The catch is that, when I create the one large data frame, that the visit information be recycled across the purchases within a list index. So for example, in the example data provided below, I'd like to have the order number, date, payment method ("Visa"), be recycled across each of the purchases in the data frame so it looks like this in the end:
73 "Order #: 065-PO-4080219" "Sunday, September 17 2017" "PowerColor Red Dragon Radeon RX 580 Dual-Fan 8GB GDDR5 PCIe Video Card" $329.99 1 "Visa"
73 "Order #: 065-PO-4080219" "Sunday, September 17 2017" "PowerColor Red Dragon Radeon RX 580 Dual-Fan 8GB GDDR5 PCIe Video Card" $329.99 1 "Visa"
73 "Order #: 065-PO-4080219" "Sunday, September 17 2017" "ASUS PRIME Z270-AR LGA 1151 ATX Intel Motherboard" $159.99 1 "Visa"
74 "Order #: 065-PO-4079152" "Saturday, September 16 2017" "Olympia Tools Tool Set 53 Piece" $12.99 1 "Visa"
74 "Order #: 065-PO-4079152" "Saturday, September 16 2017" "The Best Connection Cable Ties" $1.99 1 "Visa"
I have been able to accomplish this by first converting each list index into a dataframe and then using:
do.call("rbind", MyListOfDataFrames)
But this seems really inefficient to me (as I understand it, it's generally inefficient to first convert to dataframes and then combine them into a larger data frame).
Is there a way to convert this list into one large dataframe? I have thousands and thousands of these records to process, so I want to make this as efficient as possible. I have placed a small subset of the data on a publicly available site here to download if it helps (this can be loaded with the load()
function and the resulting list is called ProductList
and I have placed an ASCII representation of the List here using dput()
as suggested by a user and I've also placed the raw dput()
output at the end of this post). I tried searching the posts on stackoverflow but none really seemed to address this. Thanks for your help.
List snippet
[[73]][[1]]
[1] 73
[73]][[2]]
[1] "Order #: 065-PO-4080219"
[[73]][[3]]
[1] "Sunday, September 17 2017"
[[73]][[4]]
[1] "PowerColor Red Dragon Radeon RX 580 Dual-Fan 8GB GDDR5 PCIe Video Card" "PowerColor Red Dragon Radeon RX 580 Dual-Fan 8GB GDDR5 PCIe Video Card"
[3] "ASUS PRIME Z270-AR LGA 1151 ATX Intel Motherboard"
[[73]][[5]]
[1] "$329.99" "$329.99" "$159.99"
[[73]][[6]]
[1] "1" "1" "1"
[[73]][[7]]
[1] "Visa"
[[74]]
[[74]][[1]]
[1] 74
[[74]][[2]]
[1] "Order #: 065-PO-4079152"
[[74]][[3]]
[1] "Saturday, September 16 2017"
[[74]][[4]]
[1] "Olympia Tools Tool Set 53 Piece" "The Best Connection Cable Ties"
[[74]][[5]]
[1] "$12.99" "$1.99"
[[74]][[6]]
[1] "1" "1"
[[74]][[7]]
[1] "Visa"
Here's a dput()
version of the data directly in the post as suggested as well:
list(list(1, "Order #: 065-PO-4166764", "Friday, December 22 2017",
c("Belkin 12 Outlet Home Theater Surge Protector 3996 Joules with Phone/Fax/Coax Protection & 8 ft. Cord - Black",
"Match Competitor"), c("$25.90", "$0.00"), c("5", "1"), "Visa"),
list(2, "Order #: 065-PO-4067551", "Saturday, September 2 2017",
c("MSI Gaming X Radeon RX-580 Dual-Fan 8GB GDDR5 PCIe Video Card",
"QVS HDMI Female to DVI-D Male Video Adapter - Black",
"MSI Radeon RX 580 GAMING X 4GB GDDR5 Video Card"), c("$329.99",
"$9.99", "$269.99"), c("1", "1", "1"), "Master Card"),
list(3, "Order #: 041-PO-8823995", "Sunday, August 27 2017",
"MSI Armor Radeon RX-470 Overclocked Dual-Fan 8GB GDDR5 PCIe Video Card",
"$279.99", "1", "Master Card"))
Update
This post has been marked as a duplicate of other posts by moderators (for example, this one and this one), but it is not. There are other posts that have described how to collapse a list of dataframes into a single data frame, but this is not what I have. I have a list of lists that need to be collapased. Now, surely, there are posts that describe collapsing lists of list, but none that I have found are posts that describe a situation where there is a one-to many relationship present among the items in the sublists where recycling is desired. In the first example post, for example, each sublist contains only a single element, whereas the list I'm working with has several elements of unequal lengths and my shorter vectors need to be recycled within each list. The second example post doesn't address my situation either as the OP in that situation had a list of data frames. If I had a list of dataframes, my problem would be simply addressed with an inefficient do.call statement. To reiterate, my problem is one in which I have a list of lists of uneven length and each sublist (not in a dataframe), must be collapasedn with shorter elements within each recycled among the longer elementsn to form a dataframe in an efficient manner. I hope this clarifies.