How to transform the Titanic data set

Question

Please I want to transform the Titanic data set into a data set denoted Tita where each line is a passenger, which you will generate from the Frequency of each separate line. For example, if I have a line from Titanic where Age = Child, Sex = Male and Freq = 11, then generate in Tita 11 lines where age is Child and sex is Male. Tita should then include only four attributes (the Freq attribute will be eliminated). I should use a loop from 1 to 4 and the cbind function which concatenates attributes to form a data set. At each iteration, I should build an attribute from Titanic by reproducing each of its Freq values times using the rep function.

I don't understand what you're asking. Take a look again at [ask]; code and desired output would be very helpful. That includes your data as well—where are you getting the Titanic data from? I've seen it come in different formats from different sources — camille, Dec 22 '19 at 19:34
@G.Grothendieck I know, but I've seen it come from other places as well (e.g. CSV downloads from Kaggle tutorials). Since there's no data or code in the question, we don't know if they're using the 4-table array version that comes with R or something simpler — camille, Dec 22 '19 at 19:49

akrun · Answer 1 · 2019-12-22T19:36:57.427

An option is to melt the 4D array into a 2D data.frame and then use uncount to replicate the rows based on the 'value' column

library(dplyr)
library(tidyr)
data(Titanic)    
Tita <- reshape2::melt(Titanic) %>%
            uncount(value) %>% 
            as_tibble
Tita
# A tibble: 2,201 x 4
#   Class Sex   Age   Survived
#   <fct> <fct> <fct> <fct>   
# 1 3rd   Male  Child No      
# 2 3rd   Male  Child No      
# 3 3rd   Male  Child No      
# 4 3rd   Male  Child No      
# 5 3rd   Male  Child No      
# 6 3rd   Male  Child No      
# 7 3rd   Male  Child No      
# 8 3rd   Male  Child No      
# 9 3rd   Male  Child No      
#10 3rd   Male  Child No      
# … with 2,191 more rows

Or using base R (no packages used)

d1 <- as.data.frame(Titanic)
Tita <- d1[rep(seq_len(nrow(d1)), d1$Freq),1:4]
row.names(Tita) <- NULL

Thank you "akrun" this is what is intended – houssemeddin labidi Dec 22 '19 at 19:36 — houssemeddin labidi, Dec 22 '19 at 19:36

G. Grothendieck · Accepted Answer · 2019-12-22T21:00:59.760

1) as.data.frame/rep Convert the Titanic array to a data frame tdf and then for each row number and frequency repeat that row number that number of times using rep and subscript tdf by that. No packages are used.

tdf <- as.data.frame(Titanic)
Tita <- tdf[rep(1:nrow(tdf), tdf$Freq), -5]

We can check it by converting Tita back to an array whose elements should equal those of Titanic:

all.equal(Titanic, table(Tita))
## [1] TRUE

2) tableinv From our check we realize that what the question is asking for is basically the inverse of the table function so googling for that we find tableinv here: Is there a general inverse of the table() function?

Copying and pasting that function into R allows us to write:

Tita2 <- tableinv(Titanic)

Except for attributes this gives the same value as Tita in (1)

all.equal(Tita, Tita2, check.attributes = FALSE)
## [1] TRUE

How to transform the Titanic data set

2 Answers2