1

So, I have a dataset with 2 columns X and Y. Y is an integer between 0 and 5. I need to change the level of the detail of the dataset.

I want to copy the rows the number of times Y indicates As an example

X | Y
______
a | 1
b | 0
c | 2

Becomes

X | 
___
a | 
c | 
c |

a remains once, b disappears and c appears now twice. I do not need to keep the Y number, except in the number of rows of X.

My first thought was to do

df4 <- df  %>% filter (Y=4) 
df4 <- rbind(df4, df4, df4, df4)   %>% select (-Y)

but that all seems ugly, and it is not generalizable to Y =20 as an example.

Thank you!

Neoleogeo
  • 313
  • 2
  • 11

3 Answers3

3

We could use uncount

library(dplyr)
library(tidyr)
df %>%
   uncount(Y) %>%
   as_tibble

-output

# A tibble: 3 x 1
#  X    
#  <chr>
#1 a    
#2 c    
#3 c    

or in base R with rep

df[rep(seq_len(nrow(df)), df$Y),'X', drop = FALSE]

data

df <- data.frame(X = c('a', 'b', 'c'), Y = c(1, 0, 2))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

May be this:

df <- data.frame( 'x' = c('a', 'b', 'c'), 'y'= c(1, 0, 2))
rep(df$x, df$y)
or 
## For a dataframe:
df[match(rep(df$x, df$y), df$x),'x', drop=FALSE]

Output:

R>rep(df$x, df$y)
[1] "a" "c" "c"
PKumar
  • 10,971
  • 6
  • 37
  • 52
2

What about this?

data.frame(
  X = with(
    df,
    rep(X, Y)
  )
)

which gives

  X
1 a
2 c
3 c
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81