3

My dataset contains > 500 observations of match activities performed by individual athletes at different locations and recorded over the duration of a soccer match. An example of my dataset is below, where each symbol refers to a match activity. For example, KE is Kick Effective, recorded at 1 minute in the Defense.

# Example data
df <- data.frame(Symbol = c('KE', 'TE', 'TE', 'TI',
                              'KE', 'KE', 'H', 'H',
                              'GS', 'KE', 'TE', 'H',
                              'KE', 'H', 'H', 'GS'),
                Location = c('Defense', 'Defense', 'Midfield', 'Forward',
                             'Forward', 'Midfield', 'Midfield', 'Defense',
                             'Defense', 'Defense', 'Forward', 'Midfield',
                             'Midfield', 'Defense', 'Defense', 'Midfield'),
                 Time = c(1, 2, 3, 6,
                            15, 16, 16, 20,
                            22, 23, 26, 26,
                            27, 28, 28, 30))

I wish to visualise this data, by plotting the match activities over time at each location in ggplot2.

# Load required package
require(ggplot2)
# Order factors for plotting
df$Location <- factor(df$Location, levels = c("Defense", "Midfield", "Forward"))

    # Plot
    ggplot(df, x = Time, y = Location) +
      geom_text(data=df, 
                aes(x = Time, y = Location, 
                    label = Symbol), size = 4) +
      theme_classic() 

However, some of the geom_text labels overlap one another. I have tried jitter but then I lose meaning of where the activity occurs on the soccer pitch. Unfortunately, check_overlap=TRUE removes any overlapped symbols. I wish to keep the symbols in the same text direction.

Although the symbols are plotted at the time they occur, I am happy to adjust the time slightly (aware they will no longer perfectly align on the plot) to ensure the geom_text symbols are visible. I can do this manually by shifting the Time of each overlapped occurrence forward or back, but with such a big dataset this would take a very long time.

A suggestion was to use ggrepel and I did this below, although it alters the geom_text in the y-axis which is not what I am after.

library(ggrepel)
ggplot(df, x = Time, y = Location) +
  geom_text_repel(aes(Time, Location, label = Symbol)) 

Is there a way I can check for overlap and automatically adjust the symbols, to ensure they are visible and still retain meaning on the y-axis? Perhaps one solution could be to find each Location and if a Symbol is within two minutes of another in the same Location, Time is adjusted.

Any help would be greatly appreciated.

user2716568
  • 1,866
  • 3
  • 23
  • 38
  • may be `x = jitter(Time, 4)` – Sathish Mar 22 '17 at 11:10
  • Thank you for the suggestion, unfortunately there is still overlap. – user2716568 Mar 22 '17 at 11:13
  • 1
    See [ggrepel](https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html). – zx8754 Mar 22 '17 at 11:23
  • The `ggrepel` solution does work but it alters the y-axis and the labels appear offset, which is not what I am after. I would like the labels all on the same line, slightly altered in the x-axis but definitely not in the y. – user2716568 Mar 22 '17 at 11:28
  • use `set.seed()` sequentially and find the value of seed that separates the text when using `jitter`. This will make your graph reproducible. It is a trial and error process. – Sathish Mar 22 '17 at 11:55
  • Install the devel version of ggrepel from GitHub, then use `direction = "x"`. See [here for direction option in devel](https://github.com/slowkow/ggrepel/blob/master/R/geom-text-repel.R#L87) – zx8754 Mar 22 '17 at 11:55
  • I also see `ggrepel` promising... – Sathish Mar 22 '17 at 11:57
  • Regarding edit: "Perhaps one solution could be... " then the plot would be misleading, and would face the same problem once we have 3-5 labels on the same "Time" x axis. – zx8754 Mar 22 '17 at 22:06
  • The plot would be misleading and not at the exact time occurrence but it would assist the visual. The number of labels is not the issue but the overlap is. – user2716568 Mar 23 '17 at 02:33

1 Answers1

5

We could add points, then use ggrepel with minimum line length to points from text labels.

library(ggrepel) # ggrepel_0.6.5 ggplot2_2.2.1

ggplot(df, aes(x = Time, y = Location, label = Symbol)) +
  geom_point() +
  geom_text_repel(size = 4, min.segment.length = unit(0.1, "lines")) +
  theme_classic() 

enter image description here Or we could try and use development version with "direction" argument.

ggplot(df, aes(x = Time, y = Location, label = Symbol)) +
  geom_text_repel(size = 4, direction = "x") +
  theme_classic() 
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • I ran this code: `devtools::install_github("slowkow/ggrepel")` and then your `geom_text_repel` line but I receive the following error: `Error: could not find function "geom_text_repel"` – user2716568 Mar 22 '17 at 21:40
  • @user2716568 See [here](http://stackoverflow.com/questions/7027288/error-could-not-find-function-in-r) for possible solutions. – zx8754 Mar 22 '17 at 21:44
  • Thanks, I did get the development package working. Unfortunately this solution is not what I am after - the `geom_point` detracts from the symbols. The symbols are also not aligned on the axis axis, they are jittered which I am not after. – user2716568 Mar 22 '17 at 22:01
  • @user2716568 I haven't tested the devel version, the first solution above should work, points are not jittered, just the text. – zx8754 Mar 22 '17 at 22:03
  • Yes but I do not want a `geom_point` and nor do I want the text jittered. Therefore, the first solution is not ideal. – user2716568 Mar 22 '17 at 22:28