1

I am using the R programming language. I am using a computer that does not have a USB port or an internet connection - I only have R with a few preloaded libraries (e.g. ggplot2, reshape2, dplyr, base R).

Is it possible to make "parallel coordinate" plots (e.g. below) using only the "ggplot2" library and not "ggally"?

#load libraries (I do not have GGally)
library(GGally)

#load data (I have MASS)
data(crabs, package = "MASS")

#make 2 different parallel coordinate plots
ggparcoord(crabs)
ggparcoord(crabs, columns = 4:8, groupColumn = "sex")

enter image description here

Thanks

Source: https://homepage.divms.uiowa.edu/~luke/classes/STAT4580-2020/parcor.html

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • 1
    just looks like a `geom_line` to me, what have you tried? – rawr Jan 21 '21 at 06:04
  • right now, I am trying to figure out a way to format the data so that I can use the geom_line – stats_noob Jan 21 '21 at 06:09
  • I think it might be possible with the plotly library as well: https://stackoverflow.com/questions/65821992/r-plot-not-fully-loading – stats_noob Jan 21 '21 at 06:11
  • 2
    Looking at the [source code for ggparcoord](https://rdrr.io/cran/GGally/src/R/ggparcoord.R) there are a few steps before `geom_line()`, such as scaling the data and imputing missing values. Should be possible to replicate, but it will take a bit of effort. – jared_mamrot Jan 21 '21 at 08:01
  • @jared_mamrot: i wonder if this will work by copy and pasting the source code... – stats_noob Jan 21 '21 at 16:33
  • 1
    @Noob copy/pasting the source code would be the first thing I'd try. At some point you'll need to convert your data into a long format - looks like `ggparcoord` uses `reshape::melt` for that. Not sure if you have that package available - if not, look for other options at the [wide to long FAQ](https://stackoverflow.com/q/2185252/903061). – Gregor Thomas Jan 21 '21 at 16:50
  • thank you - I will see if "geom_line()" can be combined with the "melt()" function from reshape2. – stats_noob Jan 21 '21 at 17:03

2 Answers2

1

In fact, you do not even need ggplot! This is just a plot of standardised values (minus mean divided by SD), so you can implement this logic with any plotting function capable of doing so. The cleanest and easiest way to do it is in steps in base R:

enter image description here

# Standardising the variables of interest
data(crabs, package = "MASS")
crabs[, 4:8] <- apply(crabs[, 4:8], 2, scale)
# This colour solution works in great generality, although RColorBrewer has better distinct schemes
mycolours <- rainbow(length(unique(crabs$sex)), end = 0.6)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(4, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, 5), ylim = range(crabs[, 4:8]) + c(-0.2, 0.2),
     bty = "n", xaxt = "n", xlab = "Variable", ylab = "Standardised value")
axis(1, 1:5, labels = colnames(crabs)[4:8])
abline(v = 1:5, col = "#00000033", lwd = 2)
abline(h = seq(-2.5, 2.5, 0.5), col = "#00000022", lty = 2)
for (i in 1:nrow(crabs)) lines(as.numeric(crabs[i, 4:8]), col = mycolours[as.numeric(crabs$sex[i])])
legend("topright", c("Female", "Male"), lwd = 2, col = mycolours, bty = "n")
# dev.off()

You can apply this logic (x axis with integer values, y axis with standardised variable lines) in any package that can conveniently draw multiple lines (as in time series), but this solution has no extra dependencies an will not become unavailable due to an orphaned package with 3 functions getting purged from CRAN.

0

The closest thing I found to this without the "GGally" was the built in function using the "MASS" library:

#source: https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/parcoord.html
library(MASS)
parcoord(state.x77[, c(7, 4, 6, 2, 5, 3)])

ir <- rbind(iris3[,,1], iris3[,,2], iris3[,,3])
parcoord(log(ir)[, c(3, 4, 2, 1)], col = 1 + (0:149)%/%50)
stats_noob
  • 5,401
  • 4
  • 27
  • 83