0

for my thesis I have to calculate the number of workers at risk of substitution by machines. I have calculated the probability of substitution (X) and the number of employee at risk (Y) for each occupation category. I have a dataset like this:

         X         Y

1      0.1300      0
2      0.1000      0
3      0.0841     1513
4      0.0221     287
5      0.1175     3641
....
700    0.9875     4000

I tried to plot a histogram with this command:

hist(dataset1$X,dataset1$Y,xlim=c(0,1),ylim=c(0,30000),breaks=100,main="Distribution",xlab="Probability",ylab="Number of employee")

But I get this error:

In if (freq) x$counts else x$density
length > 1 and only the first element will be used

Can someone tell me what is the problem and write me the right command? Thank you!

alexander.polomodov
  • 5,396
  • 14
  • 39
  • 46
ef6493
  • 1

1 Answers1

0

It is worth pointing out that the message displayed is a Warning message, and should not prevent the results being plotted. However, it does indicate there are some issues with the data.

Without the full dataset, it is not 100% obvious what may be the problem. I believe it is caused by the data not being in the correct format, with two potential issues. Firstly, some values have a value of 0, and these won't be plotted on the histogram. Secondly, the observations appear to be inconsistently spaced.

Histograms are best built from one of two datasets:

  1. A dataframe which has been aggregated grouped into consistently sized bins.
  2. A list of values X which in the data

I prefer the second technique. As originally shown here The expandRows() function in the package splitstackshape can be used to repeat the number of rows in the dataframe by the number of observations:

set.seed(123)
dataset1 <- data.frame(X = runif(900, 0, 1), Y = runif(900, 0, 1000))

library(splitstackshape)
dataset2 <- expandRows(dataset1, "Y")

hist(dataset2$X, xlim=c(0,1))
dataset1$bins <- cut(dataset1$X, breaks = seq(0,1,0.01), labels = FALSE)

enter image description here

Michael Harper
  • 14,721
  • 2
  • 60
  • 84