0

I have a data set titled 'google_removal' with 5 columns and 14,468 rows. I am trying to expand the number of rows based on the values which specifies the frequency for observations in each row. However, I keep getting the 'undefined columns selected' error message. As far as I can see it is not a problem that stems from forgetting to add a comma and I have checked that the column in question has the particular heading I am using.

Frequency is specified in the the column 'total'. It is numerical. The other variables are character. Very much a beginner here so apologies if question is poorly structured.

Data looks like this. Happy to provide more if needed.

'period ending'  'country' 'code' 'product'  'reason' 'total' 'year'
2011-06-30       Argentina   AR    Blogger   Defamation 4      2011
2011-06-30       Argentina   AR    Blogger   Privacy    1      2011 
2011-06-30       Argentina   AR    GoogleAd  Defamation 1      2011  
2011-06-30       Argentina   AR    WebSearch Defamation 6      2011 

I was trying to achieve the following:

'period_ending'  'country' 'code' 'product'  'reason'        'year'
2011-06-30       Argentina   AR    Blogger   Defamation       2011
2011-06-30       Argentina   AR    Blogger   Defamation       2011
2011-06-30       Argentina   AR    Blogger   Defamation       2011
2011-06-30       Argentina   AR    Blogger   Defamation       2011
2011-06-30       Argentina   AR    Blogger   Privacy          2011 
2011-06-30       Argentina   AR    GoogleAd  Defamation       2011  
2011-06-30       Argentina   AR    WebSearch Defamation       2011 
2011-06-30       Argentina   AR    WebSearch Defamation       2011 
2011-06-30       Argentina   AR    WebSearch Defamation       2011 
2011-06-30       Argentina   AR    WebSearch Defamation       2011 
2011-06-30       Argentina   AR    WebSearch Defamation       2011 
2011-06-30       Argentina   AR    WebSearch Defamation       2011 

I have tried the following ('total' is the heading of the column I want to use for expanding the number of rows) adapting it from an answer to a question on expanding rows.

google_removal_expanded <- google_removal[seq_len(nrow(google_removal), google_removal$total), 1:13000]

google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:13000]

I was expecting either of these to return the expanded data frame but I instead received variations of

"Error in [.data.frame(google_removal, rep(row.names("total"), google_removal$total), : undefined columns selected" for both.

Grateful for any assistance. My attempt to solve the problem adapted the answers in Repeat each row of data.frame the number of times specified in a column

Here is some data to try it with:

dput(head(google_removal))
structure(list(period_ending = c("2011-06-30", "2011-06-30", 
"2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"), country = c("Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina"
), code = c("AR", "AR", "AR", "AR", "AR", "AR"), product = c("Blogger", 
"Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"
), reason = c("Defamation", "Privacy and Security", "Defamation", 
"Defamation", "Other", "Privacy and Security"), total = c(4L, 
1L, 1L, 6L, 1L, 8L), year = c("2011", "2011", "2011", "2011", 
"2011", "2011")), row.names = c(NA, 6L), class = "data.frame")
L_Norman
  • 3
  • 5
  • Could you show `dput(head(google_removal))` ? Also pay attention sometimes you have `google_removal` and `google-removal` – Basti Jun 29 '23 at 10:24
  • @Basti thanks for picking that up (that was just me being clumsy entering the code here) adding dput(head(google_removal)) – L_Norman Jun 29 '23 at 10:32
  • Hi! Welcome to Stack Overflow :) **Edit** I saw you already mention the other post. I think the mistake is in the `1:13000`, since the second part of the brackets is columns, not rows, so it would probably be something like `1:5` – cnluzon Jun 29 '23 at 10:51

2 Answers2

1

There is a tidyr verb for expanding frequency data such as yours:

library(tidyr)
google_removal %>% 
  uncount(total)
DavoOZ
  • 41
  • 3
  • many thanks for this! It now worked like a charm. A previous error message I received when using this was due to faults in the data file which produced observations with NA that were not allowed. Once corrected it expanded the data set as per expected. – L_Norman Jun 29 '23 at 14:31
0

The issue you are having is due to the 1:13000 part of your expression, because your data frame has 7 columns (6 removing total), so if your data frame is:

google_removal <- data.frame(
  period_ending = c("2011-06-30", "2011-06-30","2011-06-30", "2011-06-30", "2011-06-30", "2011-06-30"),
  country = c("Argentina", "Argentina", "Argentina", "Argentina", "Argentina", "Argentina"), 
  code = c("AR", "AR", "AR", "AR", "AR", "AR"),
  product = c("Blogger","Blogger", "Google Ads", "Web Search", "Web Search", "Web Search"), 
  reason = c("Defamation", "Privacy and Security", "Defamation", "Defamation", "Other", "Privacy and Security"), 
  total = c(4L, 1L, 1L, 6L, 1L, 8L),
  year = c("2011", "2011", "2011", "2011", "2011", "2011"))

Then this:

google_removal_expanded <- google_removal[rep(row.names(google_removal), google_removal$total), 1:7]

Will get you the correct result with as many rows as total field says.

cnluzon
  • 1,054
  • 1
  • 11
  • 22
  • Thanks! :) That worked perfectly for this excerpt of the data! However I am not sure how to expand it to apply to all 14 000 rows in the data set. Is there an easy way to do this? – L_Norman Jun 29 '23 at 13:21
  • As far as I know, it should work just the same! You are selecting columns on tthe `1:7` part of the expression. Did you get an error with the whole dataset? – cnluzon Jul 02 '23 at 08:03
  • Yes, it returned the 'undefined columns selected' error for reasons that remain unclear to me. – L_Norman Jul 04 '23 at 12:25