0

I have a data frame which records the changes in the name of companies. A simple representation would be :

df <- data.frame(key  = c("A", "B","C", "E","F","G"), Change = c("B", "C","D" ,"F","G","H"))
print(df)

   Key Change
1   A      B
2   B      C
3   C      D
4   E      F
5   F      G
6   G      H

I want to track all the changes a value is going through. Here is an output that can help me do so:

Key 1st 2nd  3rd  4th
1   A    B    C    D
2   E    F    G    H

How can I do it in R? I am new to R and Programming. It would be great to get help.

The question was marked duplicate of How to reshape data from long to wide format?

However, it is not an exact duplicate. For the reasons : 1. example used here contains data changing across columns. That is not the case in the question of reshaping data. Here, the two columns are dependent on each other. 2. Before reshaping, I reckon there is another step : maybe giving an id for the changes taking place. I am not sure how to do it.

Could you help me?

Sharvari Gc
  • 691
  • 1
  • 11
  • 25
  • Hi Sharvari, it's helpful if you leave a reproducible example with runnable code (so we don't have to re-create your example by creating our own data frame code, say). Read more here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Your question is not *quite* a duplicate of the question akrun suggested, but it's close, and I think you will find reshape() to be very helpful indeed, as you essentially have time point data. – Joy Jun 02 '17 at 15:20
  • Shall edit the question now asap. Thank you. – Sharvari Gc Jun 02 '17 at 15:21

1 Answers1

0

Can we assume that a same name never appears (never occurs like A->B->C and D->E->A)? If so, you can do the following.

df <- data.frame(key    = c("A","B","C", "E","F","G"),
                 Change = c("B","C","D" ,"F","G","H"))
print(df)

# mapping from old to new name
next_name <- as.character(df$Change)
names(next_name) <- df$key

all_names <- unique(c(as.character(df$key), as.character(df$Change)))
get_id <- function(x) {
  # for each name, repeatedly traverse until the final name
  ss <- x %in% names(next_name)
  if (any(ss)) {
    x[ss] <- get_id(next_name[x[ss]])
  }
  x
}
ids <- get_id(all_names)
lapply(unique(ids), function(i) c(all_names[ids==i]))

# out come is a list of company names, 
# each entry represents a history of a firm
##[[1]]
##[1] "A" "B" "C" "D"
##[[2]]
##[1] "E" "F" "G" "H"

The outcome is a list, not data frame since the number of name sequences may not be unique (firms may have different number of names).

Kota Mori
  • 6,510
  • 1
  • 21
  • 25