3

We have the following data frame a with something like this:


 > a
    google_prod      Value
1     categoria          ML
2        google         120
3       youtube          24
4     categoria          AO
5        google           2
6       youtube           0
7     categoria          ML
8        google          27
9       youtube           0
10    categoria          AO
11       google           5
12      youtube           0

We would like to get this:

    categoria google_prod    Value
1          ML   google        120
2          ML   youtube        24
3          AO   google          2
4          AO   youtube         0      
5          ML   google         27
6          ML   youtube         0    
7          AO   google          5
8          AO    youtube        0    

In other words, perform a type of application of the Spread or similar function, where only one value is taken from the google_prod column to apply it, in this case it would be the 'categoria' value.

jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • Please add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). That way you can help others to help you! – dario Feb 09 '20 at 15:19

3 Answers3

2
library(tidyverse)

# getting the data
category <- rep(c("categoria",  "google",   "youtube"), 4)
value <- c("ML", "120", "24",   "AO",   "2",    "0",    "ML",   "27",   "0",    "AO",   "5",    "0")

df <- tibble(category, value)

df %>%
  mutate(helper = rep(1:(nrow(df)/3), each = 3)) %>%
  pivot_wider(names_from = category, values_from = value) %>%
  select(-helper) %>%
  pivot_longer(names_to = "google_prod",  values_to = "values", -1)

# # A tibble: 8 x 3
# categoria google_prod values
# <chr>     <chr>       <chr> 
# 1 ML        google      120   
# 2 ML        youtube     24    
# 3 AO        google      2     
# 4 AO        youtube     0     
# 5 ML        google      27    
# 6 ML        youtube     0     
# 7 AO        google      5 
# 8 AO        youtube     0 
Jakub.Novotny
  • 2,912
  • 2
  • 6
  • 21
1

One idea would be the following. As far as I see the pattern, you are targeting values that contain two capital letters in Value. I searched where they are using grep() and obtained indice. Using this information, I created a group variable using findIntervals(). For each group, I aggregated data; I extracted and put the capital-letter value in categoria. In similar ways, I created two more columns. They are lists. So I used unnest() in the end to get the output.

library(tidyverse)

ind <- grep(x = mydf$Value, pattern = "[A-Z]+")

group_by(mydf, group = findInterval(x = 1:n(), vec = ind)) %>%
summarize(categoria = Value[google_prod == "categoria"],
          Google_prod = list(google_prod[google_prod != "categoria"]),
          Value = list(Value[google_prod != "categoria"])) %>% 
unnest(cols = Google_prod:Value)

  group categoria Google_prod Value
  <int> <chr>     <chr>       <chr>
1     1 ML        google      120  
2     1 ML        youtube     24   
3     2 AO        google      2    
4     2 AO        youtube     0    
5     3 ML        google      27   
6     3 ML        youtube     0    
7     4 AO        google      5    
8     4 AO        youtube     0    

DATA

mydf <- structure(list(google_prod = c("categoria", "google", "youtube", 
"categoria", "google", "youtube", "categoria", "google", "youtube", 
"categoria", "google", "youtube"), Value = c("ML", "120", "24", 
"AO", "2", "0", "ML", "27", "0", "AO", "5", "0")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
jazzurro
  • 23,179
  • 35
  • 66
  • 76
1

Here is another idea with creating a group with cumsum and extracting the first element

library(dplyr)
mydf %>% 
   group_by(grp = cumsum(google_prod == 'categoria')) %>% 
   mutate(categoria = first(Value)) %>% 
   slice(-1) %>% 
   ungroup %>%
   select(-grp) %>%
   type.convert(as.is = TRUE)
# A tibble: 8 x 3
#  google_prod Value categoria
#  <chr>       <int> <chr>    
#1 google        120 ML       
#2 youtube        24 ML       
#3 google          2 AO       
#4 youtube         0 AO       
#5 google         27 ML       
#6 youtube         0 ML       
#7 google          5 AO       
#8 youtube         0 AO       

data

mydf <- structure(list(google_prod = c("categoria", "google", "youtube", 
"categoria", "google", "youtube", "categoria", "google", "youtube", 
"categoria", "google", "youtube"), Value = c("ML", "120", "24", 
"AO", "2", "0", "ML", "27", "0", "AO", "5", "0")),
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
akrun
  • 874,273
  • 37
  • 540
  • 662