2

I am trying to create some histograms with ggplot2. It is showing deviations from 0 and for the first variable it works fine as you can see in figure 1.

Figure 1:

enter image description here

But with the second one, there are a some extreme outliers that stretches the x-axis, as can be seen in figure 2, which is a problem

Figure 2:

enter image description here

Therefore, I would like to add a "greater than 5"-bin that includes all values greater than 5 (and vice versa for negative values).

I found this solution, which is exactly what I need, but I cannot figure out how to apply it: https://edwinth.github.io/blog/outlier-bin/

My code:

library(ggplot2)
require(reshape2)

#Figure 1
ggplot(data = data, aes(x = data$V1*100)) +
  geom_histogram(color="white", fill='#1E206B') +
  geom_vline(xintercept = 0, color="black", linetype="dashed", size=1) +
  labs(x = "Percentage change", y = "Counts") +
  #xlim(-max(abs(data$V1))*100,max(abs(data$V1))*100) +
  theme_bw()

#Figure 2
ggplot(data = data, aes(x = data$V2*100)) +
geom_histogram(color="white", fill='#1E206B') +
  geom_vline(xintercept = 0, color="black", linetype="dashed", size=1) +
  labs(x = "Percantage change", y = "Counts") +
  #xlim(-max(abs(data$V2))*100,max(abs(data$V2))*100) +
  theme_bw()
s__
  • 9,270
  • 3
  • 27
  • 45
Frederik
  • 21
  • 5
  • Have you tried installing the package written by the author of the blog post, as he suggests? – SDS0 Feb 10 '20 at 14:18
  • You've to run the last big chunk of code in your R, it's going to put in the environment the new functions. After doing that, run the last example of the author, then you can bind it to your desire. Note, `ggplot2` in your code does not need `data$`, and the function in your source use deprecated functions. – s__ Feb 10 '20 at 14:25
  • I finally managed to run his example with a random generating process, but now I can't figure out how to fill my own data in the function. I get the error message in the error message below. I tried to read my data frame as a numeric tibble (with as.numeric and as.tibble), but it didn't work – Frederik Feb 10 '20 at 15:51
  • `Error: Must subset columns with a valid subscript vector. x The subscript has the wrong type tbl_df. i It must be numeric or character. Run rlang::last_error() to see where the error occurred.` – Frederik Feb 10 '20 at 15:56

1 Answers1

0

From: Grouping extreme value bins into one "> x" bin

I used this to check my bins prior to continuing:

cut(rlnorm(1000,3), c(-Inf, seq(-70,90, by = 20), Inf)) %>% unique()

This produced what I think you are looking for without using additional packages; I tend to prefer base R when possible.

*Not a complete answer, but hopefully helps etc.

Hunter
  • 65
  • 7