3

I am interested in how grants are reviewed at the NIH. The way the grant review process works is that Congress allocates funding to various institutes (e.g., the National Cancer Institute, or NCI), and individual grants are submitted to these institutes. These institutes are organized around various funding priorities (e.g., cancer, infectious diseases, etc).

However, when grants are reviewed, they are typically (but not always) sent to individual study sections, which are organized more around scientific disciplines. Thus, the "Tumor Progression" study section can find itself reviewing grants from both the National Cancer Institute and the National Heart, Lungs, and Blood institute (NHLBI) if a researcher submits a grant to NHLBI to study leukemia.

I have a data frame in R that looks something like this:

grant_id <- 1:100
funding_agency <- sample(rep(c("NIAID", "NIGMS", "NHLBI", "NCI", "NINDS"), 20))
study_section <- sample(rep(c("Tumor Cell Biology", "Tumor Progression", 
                              "Vector Biology", "Molecular Genetics", 
                              "Medical Imaging", "Macromolecular Structure",
                              "Infectious Diseases", "Drug Discovery", 
                              "Cognitive Neuroscience", "Aging and Geriatrics"), 
                            10)
                        )
total_cost <- rnorm(100, mean = 30000, sd = 10000)
d <- data.frame(grant_id, funding_agency, study_section, total_cost)

some(d)

   grant_id funding_agency          study_section total_cost
15       15          NINDS         Vector Biology   25242.19
19       19            NCI    Infectious Diseases   29075.21
50       50            NCI         Drug Discovery   25176.35
62       62            NCI      Tumor Progression   14264.34
64       64          NIAID     Tumor Cell Biology   30024.13

I would like to create two visualizations of these data, hopefully using R; one that shows how grants that are submitted to individual institutes are assigned to study sections, and a second that shows the dollar amount of the grants that are assigned by the institutes to study sections. What I ultimately want is a chart like you see in the following websites:

Migration flow

College major to job pipelines

Does anybody know of an R package and / or have some sample code to create a chart like you find on the websites above? Alternatively, is there a different visualization that I should consider that would accomplish the same goals?

2 Answers2

9

Here is how to do it with rCharts. You can view the final SankeyPlot here

d <- data.frame(
  id = grant_id, 
  source = funding_agency, 
  target = study_section, 
  value = total_cost
)
# devtools::install_github("rCharts", "ramnathv", ref = "dev")
require(rCharts)
sankeyPlot <- rCharts$new()
sankeyPlot$setLib('http://timelyportfolio.github.io/rCharts_d3_sankey')
sankeyPlot$set(
  data = d,
  nodeWidth = 15,
  nodePadding = 10,
  layout = 32,
  width = 750,
  height = 500,
  labelFormat = ".1%"
)
sankeyPlot

To save the chart, you can do

sankeyPlot$save('mysankey.html')

sankeyplot

Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • This is great, thank you @Ramnath! I have what is possibly a silly question -- I tried running your code but am having trouble finding where the file is saved after running `sankeyPlot`. Would you be able to tell me where the resulting chart is saved? It doesn't seem to be in my working directory. – Patrick S. Forscher Nov 01 '13 at 18:04
  • 1
    The chart generated is in a tempfile. To save the chart run `sankeyPlot$save("mysankey.html")`. – Ramnath Nov 01 '13 at 18:06
  • Perfect. Thanks for your prompt responses! – Patrick S. Forscher Nov 01 '13 at 18:45
  • can you explain this line sankeyPlot$setLib('http://timelyportfolio.github.io/rCharts_d3_sankey') – MySchizoBuddy Nov 03 '13 at 09:21
  • 1
    rCharts requires viz libraries it uses to confirm to a folder structure. To use a library, you need to specify its path, which is done using the `setLib` method. In this case, you are using the library from its online location. You can download the repo and point `setLib` to a local path as well. – Ramnath Nov 03 '13 at 11:14
1

Can't help much with the visualization piece, but you are looking for a 2-way table for the data.

Using package reshape2 and ignoring grant_id

d1 <- melt(d[,2:4])
d2 <- dcast(d1, study_section~funding_agency,sum)
> d2
              study_section      NCI     NHLBI     NIAID     NIGMS     NINDS
1      Aging and Geriatrics 28598.04  76524.55      0.00 109492.59 138330.12
2    Cognitive Neuroscience 76484.18  88217.42  78126.55  71546.62  73132.14
3            Drug Discovery 43667.30  39683.03  23797.24  46363.75 105655.61
4       Infectious Diseases 65375.44 136462.03  96413.08  34653.48  13835.22
5  Macromolecular Structure 84308.64  42290.61  39886.87  61645.00  67550.41
6           Medical Imaging 26264.32  86736.36 106356.13  41001.21  35549.83
7        Molecular Genetics 49473.72      0.00 110201.52  69468.03  86688.24
8        Tumor Cell Biology 99930.88  50862.39  95394.23  26269.98  46944.60
9         Tumor Progression 58719.89  52669.80  86874.89      0.00 119264.59
10           Vector Biology 64251.66  30880.81  66734.26 125524.72      0.00

This tells you which study_section received how much grant from which funding agency. Now how to display this is a different question. Maybe check out http://statmath.wu.ac.at/projects/vcd/

Rohit Das
  • 1,962
  • 3
  • 14
  • 23