I have the following dataset with the following variables indicating whether a person used their phone (a dummy variable with 1 = used the phone ("Yes") and 0 ("No") else); their ID and district and sub-district they live in. Note that a same person may have been recorded twice or more under different sub-districts. However, I only want to count such a person once, that is, consider only unique IDs.
district sub_district id used_phone
A SX 1 Yes
A SX 2 Yes
A SX 3 No
A SX 4 No
A SY 4 No
A SY 5 Yes
A SZ 6 Yes
A SX 6 Yes
A SZ 7 No
B RX 8 No
B RV 9 No
B RX 9 No
B RV 10 Yes
B RV 11 Yes
B RT 12 Yes
B RT 13 Yes
B RV 13 Yes
B RT 14 No
B RX 14 No
N.B: used_phone is a factor variable
For the above dataset, I want to plot a distribution of "whether a person used a phone" for which I was using the following code:
ggplot(df, aes(x=used_phone)) +
geom_bar(color = "black", fill = "aquamarine4", position = "dodge") +
labs(x="Used phone", y = "Number of people") +
ggtitle("Whether person used phone") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)))
This code works fine. However, I want to do two things:
- Add % labels for each group (yes & no) over the respective bars but y-axis to show the "count"
- Plot the graph such that it only considers the unique IDs
Looking forward to solving this with your help as I am novice in R.
Thanks, Rachita