0

I need to create a new column for a existing data frame considering levels of factors. I have 2 data frames called dat_group and dat_prices. These data frames look like below.

dat_group

         Group
1      A
2      A
3      A
4      A
5      A
6      A
7      A
8      A
9      A
10     A
11     C
12     C
13     C
14     C
15     C
16     C
17     C
18     C
19     C
20     C
21     B
22     B
23     B
24     B
25     B
26     B
27     B
28     B
29     B
30     B

dat_price

   A  B  C
1  21 45 24
2  21 45 24
3  21 45 24
4  21 45 24
5  15 11 10
6  15 11 10
7  15 11 10
8  20 13 55
9  20 13 55
10 20 13 55

I need to paste the values of A,B and C columns considering the level in dat_group. The row sequence should be the same order. If I create new column to dat_group as "price"

dat_group$Price<-NA

Then the data frame should be like ;

   Group Price
1      A    21
2      A    21
3      A    21
4      A    21
5      A    15
6      A    15
7      A    15
8      A    20
9      A    20
10     A    20
11     C    24
12     C    24
13     C    24
14     C    24
15     C    10
16     C    10
17     C    10
18     C    55
19     C    55
20     C    55
21     B    45
22     B    45
23     B    45
24     B    45
25     B    11
26     B    11
27     B    11
28     B    13
29     B    13
30     B    13 

I tried to do this using some available examples e.g.1 e.g.2, but did not work.

Please could anybody help me. The two example data frames can be accessed in following codes. My actual data set has several 1000 rows.

      dat_group<- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B", "C"), class = "factor")), .Names = "Group", class = "data.frame", row.names = c(NA, 
-30L))

    dat_price<-structure(list(A = c(21L, 21L, 21L, 21L, 15L, 15L, 15L, 20L, 
20L, 20L), B = c(45L, 45L, 45L, 45L, 11L, 11L, 11L, 13L, 13L, 
13L), C = c(24L, 24L, 24L, 24L, 10L, 10L, 10L, 55L, 55L, 55L)), .Names = c("A", 
"B", "C"), class = "data.frame", row.names = c(NA, -10L))
Community
  • 1
  • 1
sriya
  • 179
  • 1
  • 2
  • 7
  • 1
    What if there are only 2 `A`'s followed by 2 `B`s and so on. What should be output in that case? Or will the number of consequent factors be the same as row of `dat_price` ? – Ronak Shah Dec 06 '16 at 04:48

2 Answers2

0

A more defensive solution to your problem at hand. Hopefully this will work even if all of your factor's levels are not in identical multiples.

library(dplyr); library(purrr); library(magrittr)

dat_group$original_order <- seq(1:nrow(dat_group))

dat_group %<>%
    split(.$Group) %>%
    map(~ mutate(., Price =  rep(na.omit(dat_price[,unique(Group)]), n()/length(na.omit(dat_price[,unique(Group)]))))) %>%
    bind_rows() %>%
    arrange(original_order) %>% 
    select(-original_order)

dat_group

   Group Price
1      A    21
2      A    21
3      A    21
4      A    21
5      A    15
6      A    15
7      A    15
8      A    20
9      A    20
10     A    20
11     C    24
12     C    24
13     C    24
14     C    24
15     C    10
16     C    10
17     C    10
18     C    55
19     C    55
20     C    55
21     B    45
22     B    45
23     B    45
24     B    45
25     B    11
26     B    11
27     B    11
28     B    13
29     B    13
30     B    13

Original (lazy) solution:

dat_group$Price <- rep(unlist(dat_price), length.out = nrow(dat_group))
Nate
  • 10,361
  • 3
  • 33
  • 40
  • 2
    While this may solve the example, it isn't hard to see that you did not solve the underlying challenge. This solution luckily matches because A's are always followed by B's. – Pierre L Dec 06 '16 at 04:23
  • @Nathan Day: Thanks, I am wondering why the row sequence does not work OK with my data. – sriya Dec 06 '16 at 04:38
  • 1
    That does not identify the relevant level of the factor. – sriya Dec 06 '16 at 04:47
  • For some reason this answer also does not work with my actual data set. Actually it does not create column (Price) with values. – sriya Dec 06 '16 at 05:56
  • It creates the new column but again it does not identify the levels correctly. For instance level C, it gets related sequence of level B. – sriya Dec 06 '16 at 06:28
  • I mean the output that I want should be as in the example. But the codes give a different output. That does not identify the levels correctly. I do not get any error message. But by looking at the output I see it. – sriya Dec 06 '16 at 06:44
  • I tried that one too. But it gives the following error Error: This function should not be called directly – sriya Dec 06 '16 at 07:14
  • I changed the order of dat_group to C:A:B instead of the existing A:C:B. Still it does not create an output with the new variable (Price). – sriya Dec 06 '16 at 08:28
0
library(data.table)
dat_price <- as.data.table(dat_price)
dat_price_new <- cbind(dat_price[, c(1,3), with = FALSE], 
                   dat_price[, 2, with = FALSE])
melt(dat_price_new)
Gauss.Y
  • 100
  • 6
  • This code is not feasible with my actual data set since the levels in my actual data set occur more than 50 times. – sriya Dec 06 '16 at 06:51