0

This is my data set - https://www.kaggle.com/volodymyrgavrysh/bank-marketing-campaigns-dataset

I am trying to plot how many times each education type has said yes.

So first I find out from the data set who said yes

yest <- subset(bank, y == "yes")

Then I count the count of both

edcount <- plyr::count(plyr::count(yest$education, yest$yes))

But this just gives the freq of education types but not the number of times each type has said yes. What is wrong with my code?

I am trying for my data set to look like this

x           freq        freqofyes
basic.4y      400        10

As I need to find if there is a correlation between education and people saying yes.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Vij
  • 59
  • 5

1 Answers1

1

plyr has been retired, you can look into dplyr. For each unique value of education you can count their frequency with n() and number of 'yes' with sum.

library(dplyr)

data <- bank %>%
          group_by(education) %>%
          summarise(freq = n(), 
                    freqofyes = sum(y == 'yes'))

data
#  education            freq freqofyes
#  <chr>               <int>     <int>
#1 basic.4y             4176       428
#2 basic.6y             2292       188
#3 basic.9y             6045       473
#4 high.school          9515      1031
#5 illiterate             18         4
#6 professional.course  5243       595
#7 university.degree   12168      1670
#8 unknown              1731       251
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you very much. I was not taught dplyr and did not know you could pipe. So when I try to plot it ```ggplot(data = data, aes(x = freq, y = freqofyes, colour = bank$education)) + geom_point()```. It gives me this error ```Error: Aesthetics must be either length 1 or the same as the data (8): colour```. I am trying to put bank$education in as my chart now does not show what the freq in x axis is? – Vij Sep 26 '20 at 11:27
  • Why use `bank$education` when it is there in `data` already. This works : `ggplot(data = data, aes(x = freq, y = freqofyes, colour = education)) + geom_point()` – Ronak Shah Sep 26 '20 at 11:33
  • I thought it was part of the ```colname``` of the subset ```data```. I tried your function however it comes back saying ```Error in FUN(X[[i]], ...) : object 'education' not found``` – Vij Sep 26 '20 at 11:40
  • It works fine for me without any error and shows the plot. Try restarting R and running it again. If you still have issues perhaps, ask it as a new question. – Ronak Shah Sep 26 '20 at 11:48
  • I tried that and now I can't load the sub data. ```Error: `n()` must only be used inside dplyr verbs.```. As that is the error I get. I'll wipe everything clean and start new and if not I'll ask a question as you adviced. Thank you very much for your help. – Vij Sep 26 '20 at 11:53