4

I want to create a job similarity visualization that looks similar to the one found here: https://www.irecsolarcareermap.org/.

Initially, I tried using the force network, but I noticed that categorical X and Y axes are not available with this function.

This time, I'm attempting it with ggplot. The basic visualization looks okay, but I want to add links that point to similar jobs when I click on the circles representing the jobs.

The "data.csv" file contains columns such as Occ1, Occ2, Full Transferability (similarity level), and Occupation (for merging). It has over 10,000 rows, and I need to match each Occ1 to Occ2.

The job :Automotive Engineering Technician should have lines(link) to Electronic / Electrical Assembler and Electronics Assembler

Additionally, the "experience.csv" file has columns for Occupation and Strata.Level, with 126 rows. The "jobType" file has columns for Occupation and Job_type, with 176 rows.

If a job does not belong to both the experience and job types, I would like to remove it.

What I have tried is:

library(ggplot2)
library(plotly)

# Read the data
data <- read.csv("data.csv")
experience <- read.csv("experience.csv")
jobType <- read.csv("JobType.csv")

# Filter the data based on Full Transferability
filtered_data <- subset(data, Full.Transferability >= 0.9)

# Get all unique occupations from filtered data, jobType, and experience
all_occupations <- unique(c(filtered_data$Occ1, filtered_data$Occ2, jobType$Occupation, experience$Occupation))

# Create nodes dataframe with x and y coordinates
nodes <- data.frame(
  name = all_occupations,
  x = jobType$Job_type[match(all_occupations, jobType$Occupation)],
  y = experience$Strata.Level[match(all_occupations, experience$Occupation)]
)

# Remove rows with missing x or y values
nodes <- nodes[complete.cases(nodes$x, nodes$y), ]

# Create a scatterplot with jittering
gg <- ggplot(nodes, aes(x = x, y = y, text = name)) +
  geom_jitter(width = 0.2, height = 0.2, size = 1, color = "steelblue") +
  labs(x = "Job Type", y = "Experience Level") +
  theme_minimal() +
  theme(panel.grid = element_blank()) +
  geom_hline(aes(yintercept = y), color = "gray", linetype = "dashed") +
  geom_vline(aes(xintercept = x), color = "gray", linetype = "dashed") +
  coord_cartesian(clip = "off") +
  theme(plot.margin = margin(20, 20, 20, 20))

# Convert the ggplot to a plotly object
p <- ggplotly(gg)

# Register click event handler
event_register(p, "plotly_click")

# Define JavaScript function to handle the click event
js <- "
  function(eventData) {
    var selectedJob = eventData.points[0].text;
    alert('Selected job: ' + selectedJob);
  }
"

# Add the JavaScript function to the plot
p <- htmlwidgets::prependContent(p, htmltools::tags$script(js))

p

So far, I have gotten this initial result.

enter image description here

# Define JavaScript function to handle the click event

Does not necessary to be there for a pop-up message. I was just trying to show job titles when I click on jobs.

Do you know how to make jobs clickable and link them to similar jobs?"

desertnaut
  • 57,590
  • 26
  • 140
  • 166
H K
  • 130
  • 7
  • What do you mean by *links*? line segments? Literal words in your popup? I guess I just need a bit of clarity on what it is you are visualizing. By the way, it looks like you're new to SO; welcome to the community! If you want great answers quickly, it's best to make your question reproducible. This includes things like sample data like the output from `dput()` or `reprex::reprex()`. Check it out: [making R reproducible questions](https://stackoverflow.com/q/5963269). – Kat Jul 17 '23 at 16:11
  • @Kat Hi, I'm sorry for the not straightforward question. If you look at the website (google: solar career map if you don't trust the link) or If you look at my previous question I wrote in the beginning, there's a sample what I want to do. 'Links' I want to create is basically lines which connect to only similar jobs. Does it make sense? – H K Jul 18 '23 at 04:50
  • I spent a little time going through the website that you mentioned. I didn't readily find the data, either way, I'd rather spend my time answering your questions. It would be best if you included the output of `dput(head(nodes, 50))` in your question. Just make sure that this set of data works with your current code before you include it. (You may need more than 50 rows or different rows.) – Kat Jul 18 '23 at 21:34
  • @Kat Thank you so much. I just edited by including `dput` at the end of my quesion. Also, I'm asking a new question to solve it step by step. [link](https://stackoverflow.com/questions/76716149/how-to-add-multiple-lines-for-a-single-data-point-to-connect-jobs?noredirect=1#comment135252155_76716149) The big concept is the same, but I wanted to start by creating lines between jobs. I'm editing the new question to add sample dataset if you feel more comfortable. – H K Jul 18 '23 at 22:51

1 Answers1

5

Another Update

This update adds functionality that meets your latest request.

  • When you hover over any point, you will see the tooltip only.
  • If you click on a point, you will see lines that connect to the associated jobs.
  • If you move the mouse (e.g., unhover, mousemove)
    • original tooltip (lines' origin point) will persist
    • if you hover over any point connected by a line, you'll see that tooltip, as well
    • if you hover over a different point connected by a line— the origin and a new tooltip will be persistent (other tips will go away) -- in other words, at most you will see two tooltips at one time
  • If you double click
    • all lines will change to visible: false
    • no persistent tooltips
  • If you click, without having 'double clicked' or clearing the board, it will behave as if you double-clicked, then clicked (removes any previous lines; creates a new origin for persistent tooltip)

enter image description here

enter image description here

Option 4

Max: 1 set of lines and 2 tooltips; click to activate; double-click to clear

p %>% htmlwidgets::onRender(
  "function(el, x) {
    nms = ['curveNumber', 'pointNumber'];
    coll = [];                                      /* for persistent tooltip */
    giveMe = [];                                 /* for connected data points */
    oArr = el.data[0];                 /* the x, y data for the scatter trace */
    redu = function(val, arr) {                 /* closest data point in array*/
      return arr.reduce((these, those) => {
        return Math.abs(those - val) < Math.abs(these - val) ? those : these;
      });
    }
    closest = function(xval, yval) { /* p.xvals/yvals from pt data; arr is x/y data obj */
      /* id nearest x and nearest y, make sure they match, if no match, take larger index */
      xpt = redu(xval, oArr.x);           /* get closest data point for x axis*/
      ypt = redu(yval, oArr.y);           /* get closest data point for y axis*/
      xi = oArr.x.indexOf(xpt);           /* get index value for x data point */
      yi = oArr.x.indexOf(ypt);           /* get index value for x data point */
      return xi > yi ? xi : yi;          /* if the indices != return larger # */
    }
    el.on('plotly_hover', function(p) {
      pt = p;                                   /* global: for use in unhover */
    })
    el.on('plotly_unhover', function(p) {       /* create persistent tooltips */
      if(coll.length > 0){           /* if click occurred else no persistence */
        if(giveMe.length < 1) return;   /* are there lines connecting points? */
        if(!Array.isArray(giveMe)) giveMe = [giveMe]; /* make sure its an array */
        whatNow = closest(pt.xvals[0], pt.yvals[0]);  /* mouse on connected point? */
        if(giveMe.includes(whatNow)) {    /* if hover pointIndex is connected */
          coll[1] = whatNow;         /* add connected point to array for tips */
          hvr = [];                     /* clear array for curve & point list */
          for(ea in coll) {                       /* create list for hovering */ 
            var oj = {}; oj[nms[0]] = 0; 
            oj[nms[1]] = coll[ea]; 
            hvr.push(oj);
          }
        } else {
          hvr = [{'curveNumber': 0, 'pointNumber': coll[0]}]; /* if coll, create tooltip */
        }
        Plotly.Fx.hover(el, hvr);                      /* persistent tooltips */
      } 
    })
    el.on('plotly_click', function(p) {     /* create persistent lines upon click */
                                          /* if any lines already vis-- hide them */
      Plotly.restyle(el, {'visible': false}, pt.xaxes[0]._traceIndices.slice(1,));
      giveIt = p.points[0].pointIndex;  /* capture scatter index for curve number */
      if(p.points[0].customdata) {
        giveMe = p.points[0].customdata;       /* get point's array of customdata */
      } else {giveMe = []}
      coll[0] = giveIt;                   /* collect index for persistent tooltip */
      Plotly.restyle(el, {'visible': true}, [giveIt + 1]);
    })
    el.on('plotly_doubleclick', function(p) { /* remove lines & pers tooltips */
      Plotly.restyle(el, {'visible': false}, pt.xaxes[0]._traceIndices.slice(1,));
      coll = [];      /* reset arrays, until next double click */
      giveMe = [];
    }) 
  }")

This is an explanation of what is happening in this code (in general).

  • coll will contain the point indices for persistent tooltips

  • giveMe will contain the indices of connected data points (the customdata that is added to the plot)

  • oArr, redu(), & closest() are used to calculate the closest data point (when you make a tooltip persistent, Plotly won't identify or calculate new hover points, but it still captures screen position.

  • On hover just captures the hover data as a global variable. The hover data contains the screen position of the mouse.

  • On click any visible lines are removed; the point clicked will become a persistent tooltip; lines are drawn to connected data. Additionally, the connected data point indices from customdata are captured (this is giveMe). giveMe is utilized in unhover.

  • On unhover, if an origin has been selected (a point was clicked and there are lines on the graph), then... (if no click has occurred, then this function does nothing). If giveMe is empty, indicating no connected data (there are no lines)--no persistent tooltip is created. If there are connections, then all mouse moves are calculated to determine if the mouse is over a data point connected by a line. (There is a lot happening here behind the scenes.) This function uses oArr, redu, and closest for this purpose and creates a second tooltip when criteria are met.

  • On doubleclick, lines & tooltips' persistence is removed.

Updated by request

I created two new options. The first is what you asked for. However, it's pretty clunky. I think you may prefer the second option.

I noticed that when I created nodes3 wasn't creating the data as I had intended it. This led me to discover several weaknesses in the lapply, as well. These are fixed here, as well.

There are updates to the lapply that are specific to the 2nd option, but regardless of what option you use, it will work.

# create a simulation of jobs that match
nodes3 <- lapply(1:nrow(nodes), function(k) {
  thisOne <- nodes$name[k]
  mtch <- nodes$name[
    grep(pattern = paste0("^", substr(thisOne, 1, 1)), nodes$name)]
  mtch <- mtch[!mtch %in% thisOne]
  if(length(mtch) < 1) {
    data.frame(occ1 = character(), occ2 = character(),        # if no matches
               x = factor(), y = factor())
  } else {
    data.frame(occ1 = rep(thisOne, length(mtch)), occ2 = mtch, # if matches
               x = nodes$x[k], y = nodes$y[k])
  }
  }) %>% bind_rows()

cdt = list()       # list for the connected data point indices (used for 2nd option)

# retain order of points in lines' traces
invisible(lapply(1:nrow(df3), function(j) {
  dt <- df3[j, ]                          # point the lines will originate from
  mtch <- nodes3 %>% 
    filter(x == dt$x1, y == dt$y1, occ1 == dt$nm) %>%  # matching occ2
    select(occ2) %>% unlist(use.names = F)
  nodes4 <- df3[df3$nm %in% mtch, ]       # extract matched x, y positions
  if(nrow(nodes4) < 1) {
    p <<- p %>%                           # create trace so indices remain correct!
      add_lines(x = rep(df3[j, ]$x, 2), y = rep(df3[j, ]$y, 2), visible = F)                      # create lines
    return()                              # if no similar occupations
  }
  # create segment vectors for x and y
  xs <- lapply(1:nrow(nodes4), function(m) {c(dt$x, nodes4[m, ]$x, NA)}) %>% unlist()
  ys <- lapply(1:nrow(nodes4), function(m) {c(dt$y, nodes4[m, ]$y, NA)}) %>% unlist()
  
  # get row numbers of connected data
  vect <- which(df3$x %in% nodes4$x & df3$y %in% nodes4$y)
  cdt[[j]] <<- vect - 1 # 0 ind in JS, so subtract one from every value
  p <<- p %>% 
    add_lines(x = xs, y = ys, visible = F)                # create lines
}))
p

p$x$data[[1]]$customdata <- cdt   # add vectors to plot (used for 2nd option)

Option 1

In the first option, I used plotly_doubleclick. To make this work, I've modified the creation of p. I did this because I can't double-click my mouse fast enough for Plotly to register the action without this argument.

p <- ggplotly(gg) %>% config(doubleClickDelay = 1000)

Leaving the lines up until clicking becomes a hot mess really quickly. It took me making it to find the potential issues with it.

p %>% htmlwidgets::onRender(     
  "function(el, x) {
    giveMe = Array();
    el.on('plotly_hover', function(p) {  /* when hovering add lines */
      tellMe = p.points[0].pointIndex;   /* capture scatter index for curve number */
      giveMe.push(tellMe + 1);
      Plotly.restyle(el, {'visible': true}, giveMe);
    })
    el.on('plotly_doubleclick', function(p) { /* when unhovering remove lines */
      Plotly.restyle(el, {'visible': false}, giveMe);
      giveMe = [];       /* clear list after changing to visible  = F */
    })
  }")

enter image description here

Option 2

This version uses plot_click, in addition to the hovering methods. When you hover/unhover, it will still show hide the lines. However, when you click on a data point, it will show the tooltips for every point that has a line to it.

Before the lapply is called, I create an empty list. This list will store the row numbers of the data that are connected by lines which will translate to indices of the points in the plot.

After the lapply is called, I add this list of vectors to the first trace as customdata. So that these indices can be accessed in the Javascript.

Here's the code to create the custom featured tooltips via clicking. I want to specify--clicking anywhere won't work, you have to click on the data point you're interested in.

p %>% htmlwidgets::onRender(
  "function(el, x) {
    nms = ['curveNumber', 'pointNumber'];
    el.on('plotly_hover', function(p) {     /* when hovering add lines */
      tellMe = p.points[0].pointIndex;     /* capture scatter index for curve number */
      Plotly.restyle(el, {'visible': true}, [tellMe + 1]);
    })
    el.on('plotly_unhover', function(p) {   /* when unhovering remove lines */
      Plotly.restyle(el, {'visible': false}, [tellMe + 1]);
    })
    el.on('plotly_click', function(p) {
      var giveMe = p.points[0].customdata; /* get point's array of customdata */
      giveMe.push(tellMe);                 /* add current pointIndex to list */
      hvr = [];                            /* clear array for curve & point list*/
      for(ea in giveMe) {                  /* create list for hovering */ 
        var oj = {}; oj[nms[0]] = 0; 
        oj[nms[1]] = giveMe[ea] + 1; 
        hvr.push(oj);
      } 
      Plotly.Fx.hover(el, hvr);            /* show tooltips for points */
    })
  }")

enter image description here

All the code altogether (with update)

library(tidyverse)
library(plotly)

gg <- ggplot(nodes, aes(x = x, y = y, text = paste0("Selected Jobs: ", name))) +
  geom_jitter(width = 0.2, height = 0.2, size = 1, color = "steelblue") +
  labs(x = "Job Type", y = "Experience Level") +
  theme_minimal() +
  theme(panel.grid = element_blank()) +
  coord_cartesian(clip = "off") +
  theme(plot.margin = margin(20, 20, 20, 20))

# slow click speed required (used with option 1)
p <- ggplotly(gg) %>% config(doubleClickDelay = 1000) 

# capture jitter data
df3 <- data.frame(x = p$x$data[[1]]$x, y = p$x$data[[1]]$y, 
                  nm = nodes$name, x1 = nodes$x, y1 = nodes$y)

# create a simulation of jobs that match
nodes3 <- lapply(1:nrow(nodes), function(k) {
  thisOne <- nodes$name[k]
  mtch <- nodes$name[
    grep(pattern = paste0("^", substr(thisOne, 1, 1)), nodes$name)]
  mtch <- mtch[!mtch %in% thisOne]
  if(length(mtch) < 1) {
    data.frame(occ1 = character(), occ2 = character(),        # if no matches
               x = factor(), y = factor())
  } else {
    data.frame(occ1 = rep(thisOne, length(mtch)), occ2 = mtch, # if matches
               x = nodes$x[k], y = nodes$y[k])
  }
  }) %>% bind_rows()

cdt = list()       # list for the connected data point indices (used for 2nd option)

# retain order of points in lines' traces
invisible(lapply(1:nrow(df3), function(j) {
  dt <- df3[j, ]                          # point the lines will originate from
  mtch <- nodes3 %>% 
    filter(x == dt$x1, y == dt$y1, occ1 == dt$nm) %>%  # matching occ2
    select(occ2) %>% unlist(use.names = F)
  nodes4 <- df3[df3$nm %in% mtch, ]       # extract matched x, y positions
  if(nrow(nodes4) < 1) {
    p <<- p %>%                           # create trace so indices remain correct!
      add_lines(x = rep(df3[j, ]$x, 2), y = rep(df3[j, ]$y, 2), visible = F)                      # create lines
    return()                              # if no similar occupations
  }
  # create segment vectors for x and y
  xs <- lapply(1:nrow(nodes4), function(m) {c(dt$x, nodes4[m, ]$x, NA)}) %>% unlist()
  ys <- lapply(1:nrow(nodes4), function(m) {c(dt$y, nodes4[m, ]$y, NA)}) %>% unlist()
  
  # get row numbers of connected data
  vect <- which(df3$x %in% nodes4$x & df3$y %in% nodes4$y)
  cdt[[j]] <<- vect - 1 # 0 ind in JS, so subtract one from every value
  p <<- p %>% 
    add_lines(x = xs, y = ys, visible = F)                # create lines
}))
p

p$x$data[[1]]$customdata <- cdt   # add vectors to plot (used for 2nd option)

#------- Option 1 from update: -------
# hover to show lines, click to remove lines

p %>% htmlwidgets::onRender(     
  "function(el, x) {
    giveMe = Array();
    el.on('plotly_hover', function(p) {  /* when hovering add lines */
      tellMe = p.points[0].pointIndex;   /* capture scatter index for curve number */
      giveMe.push(tellMe + 1);
      Plotly.restyle(el, {'visible': true}, giveMe);
    })
    el.on('plotly_doubleclick', function(p) { /* when unhovering remove lines */
      Plotly.restyle(el, {'visible': false}, giveMe);
      giveMe = [];       /* clear list after changing to visible  = F */
    })
  }")

#------- Option 2 from update: -------
# hover/unhover to show/hide lines; click show tooltips

p %>% htmlwidgets::onRender(
  "function(el, x) {
    nms = ['curveNumber', 'pointNumber'];
    el.on('plotly_hover', function(p) {     /* when hovering add lines */
      tellMe = p.points[0].pointIndex;     /* capture scatter index for curve number */
      Plotly.restyle(el, {'visible': true}, [tellMe + 1]);
    })
    el.on('plotly_unhover', function(p) {   /* when unhovering remove lines */
      Plotly.restyle(el, {'visible': false}, [tellMe + 1]);
    })
    el.on('plotly_click', function(p) {
      var giveMe = p.points[0].customdata; /* get point's array of customdata */
      if(giveMe.length < 1) return;
      if(!Array.isArray(giveMe)) giveMe = [giveMe];
      giveMe.push(tellMe);                 /* add current pointIndex to list */
      hvr = [];                            /* clear array for curve & point list*/
      for(ea in giveMe) {                  /* create list for hovering */ 
        var oj = {}; oj[nms[0]] = 0; 
        oj[nms[1]] = giveMe[ea]; 
        hvr.push(oj);
      } 
      Plotly.Fx.hover(el, hvr);            /* show tooltips for points */
    })
  }")

#------- Original hover/unhover calls in answer -------
# hover/unhover to show/hide lines
p %>% htmlwidgets::onRender(
  "function(el, x) {
    el.on('plotly_hover', function(p) {  /* when hovering add lines */
      tellMe = p.points[0].pointIndex;   /* capture scatter index for curve number */
      Plotly.restyle(el, {'visible': true}, [tellMe + 1]);
    })
    el.on('plotly_unhover', function(p) { /* when unhovering remove lines */
      Plotly.restyle(el, {'visible': false}, [tellMe + 1]);
    })
  }")


Originally...

Things to note:

  • I don't have the data that connects similar occupations, I only have the dput for nodes. In lieu of having the connected jobs, I created a fake set of connected jobs.
  • If the appearance or functionality isn't what you were looking for, let me know what you imagined differently and I can edit my answer.
  • At the end of my answer I have added all of the code altogether for easier copy + paste.

Because this would be a hot mess if all the lines were always visible, I've modified this so that when you hover it creates the lines (as in your example plot that you provided a link for in your question).

enter image description here

Because you're using ggplot's jitter functionality, every time you run the plot, it jitters slightly differently. In order to create the segments, create the object gg and p as you already have. (So the jitter positions are permanent.)

You can modify gg slightly so that you don't need to modify the hover text after the fact. Instead of text = name, use text = paste0("Selected Jobs: ", name). In my code, you'll see that there is no event_register or prepend (all of which are replaced by this modification in ggplot).

library(plotly)
library(tidyverse)

gg <- ggplot(nodes, aes(x = x, y = y, text = paste0("Selected Jobs: ", name))) +
  geom_jitter(width = 0.2, height = 0.2, size = 1, color = "steelblue") +
  labs(x = "Job Type", y = "Experience Level") +
  theme_minimal() +
  theme(panel.grid = element_blank()) +
  coord_cartesian(clip = "off") +
  theme(plot.margin = margin(20, 20, 20, 20))


p <- ggplotly(gg)        # create plotly object to get jitter x, y

enter image description here

First step:

Extract the jittered data from your plotly object. These are the x and y that represent the scattered points' positions on the plot.

# capture jitter data & combine with nodes data
df3 <- data.frame(x = p$x$data[[1]]$x, y = p$x$data[[1]]$y, 
                  nm = nodes$name, x1 = nodes$x, y1 = nodes$y)

Interim step:

Here is where I created a fake set of data to simulate the connection between jobs. You won't need to create this data, but I included it for reproducibility.

# create a simulation of jobs that match
nodes3 <- lapply(1:nrow(nodes), function(k) {
  thisOne <- nodes$name[k]
  mtch <- nodes$name[grep(pattern = paste0("^", substr(nodes$name, 1, 1)), nodes$name)]
  mtch <- mtch[!mtch %in% thisOne]
  data.frame(occ1 = rep(thisOne, length(mtch)), occ2 = mtch,
             x = nodes$x[k], y = nodes$y[k])
}) %>% bind_rows()

Second step:

Now it's time to create the lines. You will create a Plotly trace for each row in the Nodes data. In other words, a set of lines for each scatter point on the plot. It's important to go through the rows as you fed it to Plotly, the order in which you create the lines is important! The functionality in the disappearing/reappearing lines is built on assuming the lines are created in the same order as the data scatter points.

I used lapply to go through each row in df3 (same number of rows in nodes). Using the nm in df3 (name in nodes), the data I created is filtered for matching occupations.

I only used the occupation, but you identified other criteria in your question. Again, I don't have that data, so I can't create those filters for you. Ideally, you would create a data set with this content prefiltered. However, when you look at this code, you will see how I filtered and you can change those filters here, as well.

After the matching 'points' positions are identified, I create vectors that represent line segments. There is no inherent mode for line segments in Plotly.

Here's an example of what that looks like in Plotly. If I wanted 2 segments that started at (1, 1) and ended at (2, 5) and (3, 7), this is what my x and y vectors would look like

x = c(1, 2, NA, 1, 3)

y = c(1, 5, NA, 1, 7)

An NA is placed between each start and end position.

Since there may be no similar professions, I use an if statement to look for no matches. Since there may many matching professions, each vector (x and y) are created using lapply to go through each match to create the vector.

Once x and y are identified, the lines trace is created and added to the plot. These traces are visible = F.

# retain order of points in lines' traces
invisible(lapply(1:nrow(df3), function(j) {
  dt <- df3[j, ]                                # the row the lines will originate from
  mtch <- nodes3 %>% 
    filter(x == dt$x1, y == dt$y1, occ1 == dt$nm) %>%  # extract all matching occ2
    select(occ2) %>% unlist()
  nodes4 <- df3[df3$nm %in% mtch, ]              # extract matched x, y positions
  if(nrow(nodes4) < 1) return()                  # where there are no similar occupations
  xs <- lapply(1:nrow(nodes4), function(m) {c(dt$x, nodes4[m, ]$x, NA)}) %>% unlist()
  ys <- lapply(1:nrow(nodes4), function(m) {c(dt$y, nodes4[m, ]$y, NA)}) %>% unlist()

  p <<- p %>%    # add lines to plot
    add_lines(x = xs[-(length(xs) - 1)], y = ys[-(length(xs) - 1)], visible = F)
}))

Final step:

Now it's time to add the functionality that makes the lines appear and disappear when you hover over a data point.

I used htmlwidgets::onRender(), Plotly's events plotly_hover and plotly_unhover, and Plotly.restyle to make this happen.

When you hover over a point, the event data includes the point index and the curve number. The curve number is the index of the trace. The curve number can also be used in Plotly.restyle. When I create the object tellMe, by leaving off what type of variable it is, I've created a global variable, thus allowing me to use this value in another function (created in one function, but used in two functions). Using the trace index (+ 1, as the scatter points is the first trace) you'll toggle the visibility of that data points' line segments.

p %>% htmlwidgets::onRender(
  "function(el, x) {
    el.on('plotly_hover', function(p) {  /* when hovering add lines */
      tellMe = p.points[0].pointIndex;   /* capture scatter index for curve number */
      Plotly.restyle(el, {'visible': true}, [tellMe + 1]);
    })
    el.on('plotly_unhover', function(p) { /* when unhovering remove lines */
      Plotly.restyle(el, {'visible': false}, [tellMe + 1]);
    })
  }")

enter image description here

All the code in one place

library(plotly)
library(tidyverse)

gg <- ggplot(nodes, aes(x = x, y = y, text = paste0("Selected Jobs: ", name))) +
  geom_jitter(width = 0.2, height = 0.2, size = 1, color = "steelblue") +
  labs(x = "Job Type", y = "Experience Level") +
  theme_minimal() +
  theme(panel.grid = element_blank()) +
  coord_cartesian(clip = "off") +
  theme(plot.margin = margin(20, 20, 20, 20))


p <- ggplotly(gg)

# capture jitter data
df3 <- data.frame(x = p$x$data[[1]]$x, y = p$x$data[[1]]$y, 
                  nm = nodes$name, x1 = nodes$x, y1 = nodes$y)

# create a simulation of jobs that match
nodes3 <- lapply(1:nrow(nodes), function(k) {
  thisOne <- nodes$name[k]
  mtch <- nodes$name[grep(pattern = paste0("^", substr(nodes$name, 1, 1)), nodes$name)]
  mtch <- mtch[!mtch %in% thisOne]
  data.frame(occ1 = rep(thisOne, length(mtch)), occ2 = mtch,
             x = nodes$x[k], y = nodes$y[k])
}) %>% bind_rows()

# retain order of points in lines' traces
invisible(lapply(1:nrow(df3), function(j) {
  dt <- df3[j, ]                                # the row the lines will originate from
  mtch <- nodes3 %>% 
    filter(x == dt$x1, y == dt$y1, occ1 == dt$nm) %>%  # extract all matching occ2
    select(occ2) %>% unlist()
  nodes4 <- df3[df3$nm %in% mtch, ]              # extract matched x, y positions
  if(nrow(nodes4) < 1) return()                  # where there are no similar occupations
  xs <- lapply(1:nrow(nodes4), function(m) {c(dt$x, nodes4[m, ]$x, NA)}) %>% unlist()
  ys <- lapply(1:nrow(nodes4), function(m) {c(dt$y, nodes4[m, ]$y, NA)}) %>% unlist()

  p <<- p %>%    # add lines to plot
    add_lines(x = xs[-(length(xs) - 1)], y = ys[-(length(xs) - 1)], visible = F)
}))

p %>% htmlwidgets::onRender( # show me what you've got!
  "function(el, x) {
    el.on('plotly_hover', function(p) {  /* when hovering add lines */
      tellMe = p.points[0].pointIndex;   /* capture scatter index for curve number */
      Plotly.restyle(el, {'visible': true}, [tellMe + 1]);
    })
    el.on('plotly_unhover', function(p) { /* when unhovering remove lines */
      Plotly.restyle(el, {'visible': false}, [tellMe + 1]);
    })
  }")

BTW, if you want all of the segments to be the same color, add the argument, color = I("black") (or whatever color you're looking for) to the add_lines(... when creating the line segments.

Kat
  • 15,669
  • 3
  • 18
  • 51
  • Hi, This is what I absolutely want! Thank you! I understand your same data is not as exact as mine, so I would expect there might be errors. I have two errors. For `# create a simulation of jobs that match` in the Interim step, I've got warning: `Chemical / Process Engineer Warning: argument 'pattern' has length > 1 and only the first element will be usedChemical Engineer, Compliance Manager, Computer Systems Engineer / Architect ... more than 20 occupations with the first sentence. ` even though there should be lines only for two occupations from Chemical / Process Engineer – H K Jul 20 '23 at 15:11
  • 2, For `# retain order of points in lines' traces`, I got `Error in dt <<- df3[j, ] : cannot change value of locked binding for 'dt'` So I replaced all `<<-` with a single arrow `<-`. It works, but no lines showing up when I click on jobs. Do you have ideas what the problem is? – H K Jul 20 '23 at 15:14
  • By the way, My last question would not be necessary, but it would be good to have. Is it possible to place nodes(occupations) aligned well or to spread them out not to be messy visualization? What I mean is nodes are created in random locations based on the x and y axis. I want to have them such as 3 jobs in a row from the bottom and if there are more than 3 jobs they are placed right up on the 3 jobs at the same distance each other. – H K Jul 20 '23 at 15:27
  • I am sorry to ask you a lot... Is it possible to have lines with click or any other function so lines remain even when I hover on connected jobs to see the information? If only a hover function exists, I think I'll lose lines when I move the mouse point to check connected jobs. – H K Jul 20 '23 at 16:23
  • By editing some codes, finally I can visualize similar to yours. Also, I don't know why lines are not hovering with the console but it works with r notebook output. what I only need to work on is editing `nodes3`. There should be only 126 but it has 1476. So, all jobs have additional lines which are connecting unrelated jobs. And, letting hovered lines remain (maybe by click( even when I move to see other jobs info with arranged jobs positions. – H K Jul 20 '23 at 20:21
  • 1
    You aren't asking a lot. I think I've addressed all of your comments in an update to my answer. The only thing remaining is regarding how the data is arranged (2 comments ago). I don't really have an answer for you. That's purely aesthetics, therefore your opinion. :) – Kat Jul 20 '23 at 23:39
  • 1
    Okay, it looks like you commented here while I was updating, so my last comment may seem a bit odd. Your last comment should be solved with my update. Let me know if it isn't. – Kat Jul 20 '23 at 23:41
  • Hi Kat, Thanks for the update. I've walked through both options. Both are awesome. However, I want a kind of mixed options. For example, if I click on jobs instead of hovering, lines are showing up without other jobs' information and lines are still there before I click the job once again. Then, If I move the mouse point to connected jobs, It will show me the information of that job without new or additional lines showing up. – H K Jul 22 '23 at 17:14
  • Finally, for `nodes3` in `# create a simulation of jobs that match` is the biggest issue I've had. my data has 126 rows in `filtered_data` which will be used for creating lines. It has 126 unique jobs. There will be 124 rows once I `#Remove rows with missing x and y values`. Even though I created nodes to see the unique occupation, job type and experience levels, lines should be created based on `Occ1` and `Occ2` in `filtered_data`. Therefore, I think the total number of rows should be 124 rows in `nodes3`. I have 1126 rows. It creates more lines than what it supposed to be. – H K Jul 22 '23 at 17:25
  • Since I used job connections I created, I don't have your actual data to figure out what's happening. Can you provide the data that represent occupations that need to be matched with lines? I think that's your `filtered_data`. (If so, `dput(filtered_data)` or `dput(head(filtered_data, 50))` --only reduce the size of the dataset if the issues you're having are still reproducible with that data.) As far as a compromise between the tooltips and lines, Plotly says that functionality doesn't exist at this time, but let me take a look at this again and see what I can come up with. – Kat Jul 23 '23 at 15:11
  • I've updated my answer; I think this does what you wanted. – Kat Jul 23 '23 at 22:39
  • I just updated my `filtered_data`. I also wanted to show you `dput` but not available because of the size. If you need `dput` instead of `filtered_data`, please let me know. Also, your updated code is absolutely what I want to visualize. Thanks a lot! – H K Jul 24 '23 at 01:46
  • FYI, in the future always use `dput(head(data))` (not a table or pic). To create the `customdata`: `xx <- lapply(1:nrow(filtered_data), function(j) { filter(nodes, nodes$name == filtered_data[j, ]$Occ1) %>% select(x, y)}) %>% bind_rows(); fd2 <- cbind(filtered_data, xx) %>% as.data.frame()`. Then, within the `invisible(lapply...`, where you see `nodes3`, change it to `fd2`, where you see `occ1`, change it to `Occ1`; where you see `occ2`, change it to `Occ2`. (There is only 1 of each.) If you have any errors—clear the envir and run the code again. – Kat Jul 24 '23 at 18:13
  • Thank you and sorry for the dput in the beginning. It is better now. There are less lines before. However, some occupations still have multiple lines. For example, "Chemical / Process Engineer" should have 2 lines and I can see them as well in `fd2`. However, there are more than 5 lines with it. Do you have any ideas for the reason of it? (I updated `dput(head(fd2,50)`) – H K Jul 24 '23 at 23:21
  • Can you run it with the sample of both `nodes` and `filtered_jobs` from your question? Remember, I only have a small sample of `nodes`. Unfortunately, I haven't found the problem. If I were to hazard a guess it's something to do with Occ1 -> Occ2 versus Occ2 -> Occ1. Can you look at that and let me know? if you still see an issue, point out one such example of an error based on the data provided in your question, please. – Kat Jul 25 '23 at 00:16
  • I am so sorry that I was working with the wrong codes together. Finally, I've got what I wanted. The remaining stuff is how to data is aligned. This is what I need to do on my end. Thank you for the long journey! – H K Jul 25 '23 at 02:47
  • Hi, Do you mind me asking you the background colors? I want to add background colours based on x and y axis with `colors <- c("#FFFFE0", "#FFEF00", "#FFDb66","#e4d99d", "#FFFF33", "#FFEF55", "#FFDc99","#e4d99e","#FFFF99" , "#FFEE55", "#FFDd99","#e4d99d", "#FFFF66", "#FFEA00", "#FFDF00","#e4d00a", "#FFFF33", "#FFEB00", "#FFDa66","#e4d55b") colors <- c("#FFFFE0", "#FFEF00", "#FFDb66","#e4d99d", "#FFFF33", "#FFEF55", "#FFDc99","#e4d99e","#FFFF99" , "#FFEE55", "#FFDd99","#e4d99d", "#FFFF66", "#FFEA00", "#FFDF00","#e4d00a", "#FFFF33", "#FFEB00", "#FFDa66","#e4d55b")` – H K Jul 25 '23 at 22:20
  • If I add `background_data <- expand.grid(x = levels(nodes$x), y = levels(nodes$y))` `background_data$color_group <- interaction(background_data$x, background_data$y)` And modify the scatterplot of gg by `gg <- ggplot() + geom_tile(data = background_data, aes(x = x, y = y, fill = color_group), color = "white") + geom_jitter(data = nodes, aes(x = x, y = y, text = name), width = 0.2, height = 0.2, size = 1, color = "steelblue") ` It shows `Warning: Ignoring unknown aesthetics: text` and when I run `df3`, I et `arguments imply differing number of rows: 5, 124`. Do you know the reasons? – H K Jul 25 '23 at 22:41
  • When you added tiles, you went from having one Plotly trace to having 21 traces before adding the lines. Currently, none of the code I've provided is going to work correctly. I am more than happy to help, but this isn't part of the original question. Would you please ask this as a new question? Make sure that your question includes a link to this question and what you've changed and the problems you've run into with your additions. I suggest that you add a link to that new question in a comment here so that I can find it easier. Either way, I'll start working on a solution now. – Kat Jul 26 '23 at 02:42
  • Hi Kat, Thank you. I just posed a new question here:[https://stackoverflow.com/questions/76768417/how-to-add-background-colours-with-geom-gitter-and-geom-based-on-x-and-y-axis-f] – H K Jul 26 '23 at 06:03