0

Say I have a large dataset, and the information is organized based on a type of entry, and the amount of occurrences of that type of entry.

Say...

   Area        Animal                              Observations       
   US           Cat                                   4
   NE           Cat                                   9
   US           Dog                                   2

My question is how would I create a dataset (to do analysis in R) that would list the items like...

Say...

   Area        Animal      
    US            Cat
    US            Cat
    US            Cat...
    US
    NE
    NE
    NE
    NE....
    US..          Dog..

I'm asking because I have a large data set and I'm trying to get each entry for each row, rather them being grouped. Anyone know how to do this?

pnuts
  • 58,317
  • 11
  • 87
  • 139
Timothy
  • 49
  • 1
  • 4

3 Answers3

1

Try

library(splitstackshape)
expandRows(df1, 'Observations')
#   Area Animal
#1     US    Cat
#1.1   US    Cat
#1.2   US    Cat
#1.3   US    Cat
#2     NE    Cat
#2.1   NE    Cat
#2.2   NE    Cat
#2.3   NE    Cat
#2.4   NE    Cat
#2.5   NE    Cat
#2.6   NE    Cat
#2.7   NE    Cat
#2.8   NE    Cat
#3     US    Dog
#3.1   US    Dog
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Index the dataframe by 'rownames' repeated as many times as 'Observations':

> rep(rownames(dat), dat$Observations)
 [1] "1" "1" "1" "1" "2" "2" "2" "2" "2" "2" "2" "2" "2" "3" "3"

> dat[ rep(rownames(dat), dat$Observations) , ]
    Area Animal Observations
1     US    Cat            4
1.1   US    Cat            4
1.2   US    Cat            4
1.3   US    Cat            4
2     NE    Cat            9
2.1   NE    Cat            9
2.2   NE    Cat            9
2.3   NE    Cat            9
2.4   NE    Cat            9
2.5   NE    Cat            9
2.6   NE    Cat            9
2.7   NE    Cat            9
2.8   NE    Cat            9
3     US    Dog            2
3.1   US    Dog            2
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

Here's an approach using lapply() and rep():

df <- data.frame(Area=c('US','NE','US'), Animal=c('Cat','Cat','Dog'), Observations=c(4,9,2) );
as.data.frame(lapply(df[-3],rep,df[,3]));
##    Area Animal
## 1    US    Cat
## 2    US    Cat
## 3    US    Cat
## 4    US    Cat
## 5    NE    Cat
## 6    NE    Cat
## 7    NE    Cat
## 8    NE    Cat
## 9    NE    Cat
## 10   NE    Cat
## 11   NE    Cat
## 12   NE    Cat
## 13   NE    Cat
## 14   US    Dog
## 15   US    Dog
bgoldst
  • 34,190
  • 6
  • 38
  • 64