I'm new, and I feel like this whole thing is a mess so I will do my best to explain what I'm working on. I have four CSVs with information about users birthyear and gender. The data was collected in 2016, so since I only have the birthyear, I tried to add a column that is specifically the age. Some of the rows for each of those columns are blank. I have found that there is likely an error in some of the birthyears as this data is for bike rentals, and it is highly unlikely that someone who is 117 years old picked up a rental bike. 1.I want to limit my ages to 85(as that is where the data stops being consistent). 2.I want to merge the data from the four CSVs. 3.I want to create a stacked column chart that fills with gender on the x, and I want the y axis to be count. 4.Lastly, I want to save this as a png.
I've been working on this for 3 days, and I know that this is something I should be able to do in 10 minutes. I feel like I'm close, but I'm tired and would love any help or feedback on how to improve or what I'm missing.
Here is my current code. I currently don't have it filtering beneath 85 years old. This is causing the x axis to extend further than necessary. Also, I filtered out the blanks in gender, but R is still generating an color for blanks and labeling it.
# List of file names
DF1 <- read.csv("D:/Projects/Cyclists/Divvy_Trips_2016_Q1Q2/Divvy_Trips_2016_Q1.csv")
DF2 <- read.csv("D:/Projects/Cyclists/Divvy_Trips_2016_Q1Q2/Divvy_Trips_2016_04.csv")
DF3 <- read.csv("D:/Projects/Cyclists/Divvy_Trips_2016_Q1Q2/Divvy_Trips_2016_05.csv")
DF4 <- read.csv("D:/Projects/Cyclists/Divvy_Trips_2016_Q1Q2/Divvy_Trips_2016_06.csv")
# Load data frames from files and select columns
df1_select <- select(DF1, birthyear, gender)
df2_select <- select(DF2, birthyear, gender)
df3_select <- select(DF3, birthyear, gender)
df4_select <- select(DF4, birthyear, gender)
age<-combined_df %>%
mutate(age=2016-birthyear)
#Combine the five data frames into one:
combined_df <- bind_rows(df1_select, df2_select, df3_select, df4_select,age)
#Create a visualization of the gender and age correlations.
combined_df %>%
drop_na(gender) %>%
drop_na(age) %>%
ggplot(aes(age,fill=gender))+
geom_bar()
ggsave("gender_age.png")