0

I'm new, and I feel like this whole thing is a mess so I will do my best to explain what I'm working on. I have four CSVs with information about users birthyear and gender. The data was collected in 2016, so since I only have the birthyear, I tried to add a column that is specifically the age. Some of the rows for each of those columns are blank. I have found that there is likely an error in some of the birthyears as this data is for bike rentals, and it is highly unlikely that someone who is 117 years old picked up a rental bike. 1.I want to limit my ages to 85(as that is where the data stops being consistent). 2.I want to merge the data from the four CSVs. 3.I want to create a stacked column chart that fills with gender on the x, and I want the y axis to be count. 4.Lastly, I want to save this as a png.

I've been working on this for 3 days, and I know that this is something I should be able to do in 10 minutes. I feel like I'm close, but I'm tired and would love any help or feedback on how to improve or what I'm missing.

Here is my current code. I currently don't have it filtering beneath 85 years old. This is causing the x axis to extend further than necessary. Also, I filtered out the blanks in gender, but R is still generating an color for blanks and labeling it.

# List of file names
DF1 <- read.csv("D:/Projects/Cyclists/Divvy_Trips_2016_Q1Q2/Divvy_Trips_2016_Q1.csv") 
DF2 <- read.csv("D:/Projects/Cyclists/Divvy_Trips_2016_Q1Q2/Divvy_Trips_2016_04.csv")
DF3 <- read.csv("D:/Projects/Cyclists/Divvy_Trips_2016_Q1Q2/Divvy_Trips_2016_05.csv")
DF4 <- read.csv("D:/Projects/Cyclists/Divvy_Trips_2016_Q1Q2/Divvy_Trips_2016_06.csv")

# Load data frames from files and select columns
df1_select <- select(DF1, birthyear, gender)
df2_select <- select(DF2, birthyear, gender)
df3_select <- select(DF3, birthyear, gender)
df4_select <- select(DF4, birthyear, gender)
age<-combined_df %>% 
  mutate(age=2016-birthyear)
#Combine the five data frames into one:
combined_df <- bind_rows(df1_select, df2_select, df3_select, df4_select,age)
 #Create a visualization of the gender and age correlations.
combined_df %>% 
  drop_na(gender) %>% 
  drop_na(age) %>%
  ggplot(aes(age,fill=gender))+
  geom_bar()
  ggsave("gender_age.png")
  • 4
    (1) While it may be a lot to read, take a look at [list of frames](https://stackoverflow.com/a/24376207/3358227); once you get the hang of it, it makes dealing with 2 files or 200 the same. It's worth it. (2) It would really help to have an actual sample of data, please see https://stackoverflow.com/q/5963269 for some quick discussions on how to use `dput`, `data.frame`, or `read.table` to share unambiguous minimal small datasets so that we can attempt your code. Thanks! – r2evans Mar 21 '23 at 02:29

1 Answers1

0

R can feel overwhelming at the time, I remember being new as well and taking HOURS to do basic things. :) Anyway, here is partial solution. Partial because you did not provide any example data and there is no way for us to know how your CSVs look like.

Simulate some data: Procedure for merging dataframes is the same as provided here for merging these vectors, you just use rbind.data.frame instead of cbind

a<-rnorm(100, 80,10)
b<-rep(letters[1:5], 20)
d<-rbinom(100,1,0.4)

abd<-cbind.data.frame(a,b,d)

1.) filter variable to be lower and/or equel to 85

library(tidyverse)

abd_new<-abd %>% filter(a<=85)

3.) stacked bar plot (I suppose this is what you mean by a column chart?)

ggplot(abd, aes(x=b,y=d))+
  geom_bar(aes(fill=a), stat="identity")

enter image description here

4.) If you are not working in Rstudio, start. Just click "export image" and save it the way you want it.

procerus
  • 194
  • 8