0

I'm working on a logistic data set, trying to visualise the steps taken by the subjects in the data set.

My input data has the following form: ID_object;proces

The undertaken_proces has the form: A-B|B-C|C-D...

I'm visualising in visNetwork.

For the nodes I have a predefined dataframe where all possible nodes (A-Z) are declared and given some initial properties (e.g. all nodes startout gray and have a certain label).

For the links between the nodes, I've written a loop that:

  • Looks at all rows in the dataset
  • Per row takes the undertaken_proces columns
  • Splits the undertaken_proces by "|" to get the individual movements ( A-B, B-C, ...)
  • Splits the result above by "-" to get a list of vectors where the first element is the "from-object" and the second element the "to-object"
  • Loop over the list of vectors and fill one vector containing all "from-objects" and one vector containg all "to-objects".
  • construct a dataframe with columns from, to based on the vector above and som additional columns for additional properties (e.d. color of the link direction of arrow)
  • Per row in the parent loop I check what the last object is where the ID was and change the color for that node to another color
  • After the loop grouping the links dataframe by from, to, color and arrow to obtain the total movements per grouping

This way works fine for smaller datasets, but is teribly slow/unsuble for larger ones. I'm guessing there is a way with mapping and/or apply functions, but I havent figured it out yet. Can someone point me in the right direction?

library(visNetwork)

visNetwordGridLayout <- function(x)
{
 x[is.na(x)] <- 0

 x <- apply(t(x), 1, rev)

 LmatX <- seq(-1,1,length=ncol(x))
 LmatY <- seq(1,-1,length=nrow(x))

   loc <- t(sapply(1:max(x),function(y)which(x==y,arr.ind=T)))
   layout <- cbind(LmatX[loc[,2]],LmatY[loc[,1]])
   return(layout)
   }

  ID_movements <- data.frame(
   ID = c(1, 2),
   PROCES = c("A-B|B-C|C-E", "A-B|B-C|C-D"),
    stringsAsFactors = FALSE
   )

  # nodes
   nodes <- data.frame(id = c("A", "B", "C", "D", "E"))
    nodes <- nodes %>% mutate(label = id)
    nodes$color.background <- rep('grey', nrow(nodes))

  # Links
 links <- data.frame()
  for (row in 1:nrow(ID_movements)) {
  ID_movement <- ID_movements[row, ]$PROCES

  procesSteps <- strsplit(ID_movement , "\\|")[[1]]
  procesSteps <- strsplit(procesSteps, '-')

  fromVec <- c()
  toVec <- c()
  for (step in procesSteps){
    fromVec <- c(fromVec , step[1])
    toVec <- c(toVec , step[2])
  }
   links <- rbind(links,
             data.frame(from = fromVec,
                        to = toVec,
                        color = 'blue',
                        arrows = 'to')
  )

 lastNode <- last(toVec)
 nodes$color.background[nodes$id == 
                       lastNode] <- 'green'
 }
links <- links %>%
  group_by(from, to, color, arrows) %>%
 summarise(label = n()) %>%
  ungroup()

# Grid waar de nodes komen
 grid <- matrix(
 match(
c(
  "A",0,0,0,0,
  0,"B",0,0,0,
  0,0,"C",0,"D",
  0,0,0,0,"E"
),
  nodes$id),
 nrow=4,byrow=TRUE)


visNetwork(nodes, links) %>%
 visIgraphLayout(layout = "layout.norm",
              layoutMatrix = visNetwordGridLayout(grid))
DCB
  • 107
  • 12
  • 1
    This will be much easier if you provide a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example of your input data, and an example of your desired output – Conor Neilson Nov 03 '18 at 11:33
  • Good point Conor, added an example – DCB Nov 03 '18 at 11:53
  • WHy are you doing ` procesSteps <- strsplit(ID_movement , "\\|")[[1]] procesSteps <- strsplit(procesSteps, '-') ` inside the loop? It looks like those are constants. Just do them once, that's the first thing to fix. – Elin Nov 03 '18 at 12:07
  • Also I think you want `procesSteps <- strsplit(ID_movements$PROCES , "|", fixed = TRUE)` – Elin Nov 03 '18 at 12:13
  • procesSteps <- strsplit(ID_movement , "\\|")[[1]] and procesSteps <- strsplit(procesSteps, '-') are not constant. ID_movement is defined in the loop by ID_movement <- ID_movements[row, ]$PROCES. Its one of the processes from the larger data frame – DCB Nov 03 '18 at 12:19
  • If it is from a data frame it is constant, not being calculated in the loop. – Elin Nov 03 '18 at 12:35
  • I don't see where you'rr going at. In procesSteps <- strsplit(ID_movement , "\\|")[[1]] and procesSteps <- strsplit(procesSteps, '-') I obtain the proces from the main data frame ID_movement <- ID_movements[row, ]$PROCES in my desired format splitted by | first and then by -. But if there is a way to do this outside the loop, i'd like to learn that – DCB Nov 03 '18 at 12:42

1 Answers1

2

So this is just a start I think, but in terms of speeding things up you want to stop recalculating the same thing over and over. Even if you do have to make another loop of some kind, make sure you are only doing calculations when they have new data.

So for example

ID_movements <- data.frame(
  ID = c(1, 2),
  PROCES = c("A-B|B-C|C-E", "A-B|B-C|C-D"),
  stringsAsFactors = FALSE
)

# nodes
nodes <- data.frame(id = c("A", "B", "C", "D", "E"))
nodes <- nodes %>% mutate(label = id)
nodes$color.background <- rep('grey', nrow(nodes))


procesSteps <- strsplit(ID_movements$PROCES , "|", fixed = TRUE)
procesSteps <- lapply(procesSteps, strsplit, split = '-')
names(procesSteps) <- ID_movements$ID
procesSteps <- as.data.frame(t(as.data.frame(procesSteps)))
names(procesSteps) <- c("from", "to")
procesSteps$color <- "blue"
procesSteps$arrows <- "to"
procesSteps$id <- sub("\\..*$", "",  row.names(procesSteps))

gives you a data frame with from and to, color, arrows and the id (prefixed by X -- sub again to get rid of the x if desired).

Elin
  • 6,507
  • 3
  • 25
  • 47