1

I have a data frame and I want to convert one of the column to letters to A, B, C, D and create a summarized time:

ticket <- c('1-5444', '1-5444', '1-5444', '1-5444', '1-5444', '1-5444', '1-5445')
person <- c('John','John','Kai', 'John', 'Kai', 'Bob', 'John')
time<- c(NA, 1, 2,1, 3, 4, NA)
df <- data.frame(ticket,person,time)

I want to create a abstract variable called z, which will take an abstract value for the person column. For example, in John-John-Kai-John-Kai-Bob,there are essentially three persons and hence, A-A-B-A-B-C. So z will take values of the corresponding actors as shown below:

     ticket     person    time   z  ztime 
     1-5444      John     NA     A    2 
     1-5444      John     1      A    2
     1-5444      Kai      2      B    5
     1-5444      John     1      A    2
     1-5444      Kai      3      B    5
     1-5444      Bob      4      C    4
     1-5445      John     NA     A    0

Then I would like to calculate ztime which tells the sum of amount of time each person has taken. Any thoughts?

David C.
  • 1,974
  • 2
  • 19
  • 29
user3570187
  • 1,743
  • 3
  • 17
  • 34

3 Answers3

2

Creating data frame with StringsAsFactors = TRUE (default), already creates the variable person with 3 different levels. All you need is creating new variables:

df <- transform(df,
  z = LETTERS[person],
  ztime = by(time, person, sum, na.rm = TRUE)[person]
)

or (as requested in comments) if grouping by person and ticket:

df <- transform(df,
  z = LETTERS[person],
  ztime = ave(time, ticket, person, FUN = function(x) sum(x, na.rm = TRUE))
)
bergant
  • 7,122
  • 1
  • 20
  • 24
  • I am getting a strange value called NA in the z column, please advise if i am missing anything – user3570187 Feb 06 '17 at 23:42
  • I guess your `person` field is not a factor. The data frame is created differently - It is also possible that your session is set with stringsAsFactors = FALSE option. Check with `getOption("stringsAsFactors")`. You can also change the variable into a factor by `df$person <- factor(df$person)` – bergant Feb 06 '17 at 23:56
  • I have changed that into factor. I also have a ticket number, so the person vary across tickets, so we might have to account for that when grouping these values. what do you think? The persons can be infinite but in a ticket there are less than 26 characters – user3570187 Feb 07 '17 at 00:01
  • I'm sorry, but I don't know what exactly is the problem here. Number of characters is not an issue when using factors. – bergant Feb 07 '17 at 00:15
  • I see, thanks. I want the letters to be reset after every ticket – user3570187 Feb 07 '17 at 00:46
  • Grouping can be done also by two factors, so you don't need to create special letters. See one way to do it in answer update. – bergant Feb 07 '17 at 01:31
  • Thanks a lot ! I have 1497 names/levels for person string which is the reason for this issue, I probably need to group by for z string as well! Let me check the code if it works! – user3570187 Feb 07 '17 at 01:40
1

Can be done in two steps.

values <- c("C", "A", "B")
df$z <- values[df$person]
aggr = ddply(df,.(ticket,person),summarize, ztime=sum(time,na.rm=T))
df = join(df,aggr,by=c("ticket","person"),type="left")
View(df)

  ticket person time z ztime
1 1-5444   John   NA A     2
2 1-5444   John    1 A     2
3 1-5444    Kai    2 B     5
4 1-5444   John    1 A     2
5 1-5444    Kai    3 B     5
6 1-5444    Bob    4 C     4
7 1-5445   John   NA A     0
Alexey Ferapontov
  • 5,029
  • 4
  • 22
  • 39
0

To make it generic and automatically adaptable to the number of different persons:

  1. getting the number of persons using uniques()
  2. Generate a list of letters (See this post)
  3. creating an association list using a list()
  4. joining as in the previous answer
  5. dyplyr to aggregate
Community
  • 1
  • 1
Malo Marrec
  • 599
  • 1
  • 6
  • 20
  • The question you link to won't help generate unique identifiers if there are more than 26. [Something like this would be more appropriate.](http://stackoverflow.com/q/25876755/903061) – Gregor Thomas Feb 07 '17 at 00:59
  • exactly thats the problem, but i have ticket numbers for which the numbers of persons is less than 26, is there a workaround? – user3570187 Feb 07 '17 at 05:19