Can some one help me with an R script error, within my mutate() function

Question

here is my code:

# Check if pacman is installed and install it if not
if (!require("pacman")) install.packages("pacman")
print("Pacman Installed")

# Use pacman to load/install required packages
pacman::p_load(pacman, datasets, tidyverse, tsibble, lubridate)
print("Packages Loaded")

# Load the nottem dataset
data("nottem")
print("Nottem Loaded")

# Store the nottem dataset as a tibble
nottem_df <- as_tibble(nottem)
print("Nottem Stored as Tibble")

# Store the nottem dataset as a tidy df
nottem_tidy_df <- nottem_df %>%
  mutate(date = floor_date(date, unit = "year"),
         year = year(date),
         month = month(date)) %>%
  select(date, year, month, temperature)
print("Nottem Stored as Tidy df")

# Average annual temperature by year df
average_temp_by_year_df <- nottem_tidy_df %>%
  group_by(year) %>%
  summarize(avg_temp = mean(temperature))
print("Average Annual Temp by Year Stored as df")


# Plot the annual temperature by year
ggplot(average_temp_by_year_df, aes(year, avg_temp)) +
  geom_line() +
  geom_smooth(method = "loess") +
  ggtitle("Annual Temperature by Year") +
  xlab("Year") +
  ylab("Temperature (°C)")+
  ggsave("Annual_Temperature_by_Year.png")
print("AAT Plotted")

# Load the Titanic dataset
data("Titanic")
print("Titanic Loaded")

# Store the Titanic dataset as a tibble
titanic_tibble_df <- as_tibble(Titanic)
print("Titanic Dataset Sored as Tibble")

# Uncount the tibble Titanic dataset and make each of the 4 variables a factor
titanic_factors_df <- titanic_tibble_df %>%
  mutate_at(c("Class", "Age", "Sex", "Survived"), as.factor)
print("Tibble Uncounted")

# Compute the proportion of people that survived
num_survived <- sum(titanic_factors_df$Survived == "Yes")
num_total <- nrow(titanic_factors_df)
prop_survived <- num_survived/num_total
print("Surviver Ratio Computed")

# Count the number of passengers in each class
class_count_df <- titanic_factors_df %>%
  group_by(Class) %>%
  summarize(count = n())
print("Passengers Counted by Class")

# Count the number of passengers who survived in each class
class_survived_df <- titanic_factors_df %>%
  filter(Survived == "Yes") %>%
  group_by(Class) %>%
  summarize(survived_count = n())
print("Class Survivers Counted")

# Append the class totals to the survival totals df
class_totals_df <- class_count_df %>%
  left_join(class_survived_df, by = "Class")
print("Df Appended")

# Compute the proportion of those that survived by class
class_totals_df$prop_survived <- class_totals_df$survived_count/class_totals_df$count
print("Class Surviver Ratio Computed")

# Plot the proportion of those that survived by class
ggplot(class_counts, aes(x = Class, y = prop_survived)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(limits = c(0, 1), labels = scales::percent_format()) +
  labs(x = "Class", y = "Proportion of Passengers Survived", 
       title = "Proportion of Passengers Survived by Class") +
  ggsave("proportion_survived_by_class.png")
print("Class Survivers Plotted")

here are the errors: “Error in mutate(): ! Problem while computing year = year(date). Caused by error in as.POSIXlt.default(): ! do not know how to convert 'x' to class “POSIXlt” Run rlang::last_error() to see where the error occurred. ” “Error in group_by(., year) : object 'nottem_tidy_df' not found”

“Error in ggplot(average_temp_by_year_df, aes(year, avg_temp)) : object 'average_temp_by_year_df' not found”

“Error in ggplot(class_counts, aes(x = Class, y = prop_survived)) : object 'class_counts' not found”

And finally, here are the paramiters I'm working from: In a blank R Script file inside of RStudio, write and execute lines that do the following:

Use comments to create a title area that includes your name and assignment name For each major bullet point below, write a header/comment that briefly describes what each line is doing. For every operation, make sure to print the results with print() Check if pacman is installed and install it if not Use pacman to load/install: pacman, datasets, tidyverse, tsibble, and lubridate Store the nottem dataset as a df using tsibble Store the nottem dataset as a tidy df with the date as an index and a separate column for just year, month, and temperature Create a df that shows the average annual temp by year Plot the annual temperature by year and add a smoothing line. (Appropriate Axis Labels and Title). Save the image. Store the Titanic dataset as a df Store the Titanic dataset as a df using tibble Uncount the tibble Titanic dataset and make each of the 4 variables a factor You can do 1 line at a time on variables changes or you can do similar ones in a group at once with mutate_at(c(“v1”,”v2”,”v3”,”v4”,…),var_type) Store result as a df Compute the proportion of people that survived. Num_Survived/Num_Total Create two variables: one for the total and one for how many survived summarise(n()) can be used to count the number of rows in a df Use filter to reduce the df to only those that survived Divide the counts Count and store in a df how many passengers were in each “Class” Group_by(Class) %>% then count Since the result is a column and not just a single data point like before, you should give a name to each of your counts: summarise(var_name=n()) Count and store in a df how many passengers survived in each “Class” Append the class totals to the survival totals df Should be a new df with the same but now 3 columns, Class, Survival Count, and Total Count Compute the proportion of those that survived by class and append as a 4th column in the survival totals df Can’t just divide the entire df to find the proportion (What is Crew/Crew?). Reference just a specific column. df$var Use ggplot to create a bar graph with Class on the x-axis and your proportion on the y-axis with proper axis labels and title. Scale the y axis in a way that assist with readability. Save the image. Use geom_bar(stat=”identity”) to make it work Save your Script file. Upload your script file and both images to complete the assignment.

I've asked friends, chegg, and chatGPT and can't seem to get any helpful advice.

I was told to change my mutate section to this:

# Store the nottem dataset as a tidy df
nottem_tidy_df <- nottem_df %>%
  mutate(date = lubridate::floor_date(date, unit = "year"),
         year = year(date),
         month = month(date)) %>%
  select(date, year, month, temperature)
print("Nottem Stored as Tidy df")

and this:

# Store the nottem dataset as a tidy df
nottem_tidy_df <- nottem_df %>%
  mutate(date = stats::floor_date(date, unit = "year"),
         year = year(date),
         month = month(date)) %>%
  select(date, year, month, temperature)
print("Nottem Stored as Tidy df")

but neither worked

Your first error suggests that `date` in `nottem_df` is not reading in as a date data type. So floor_date and year aren't working on it, probably causing at least some problems downstream. What is the output of `str(nottem_df)` when you first load it? — Jon Spring, Feb 02 '23 at 06:11
"Store the nottem dataset as a df using tsibble". Note that `tsibble` and `tibble` are different packages and functions. `tsibble` is a variation that specifically uses time series. — Jon Spring, Feb 02 '23 at 06:32
Please trim your code to make it easier to find your problem. Follow these guidelines to create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). — Community, Feb 02 '23 at 07:04
FYI: `mutate_at` has been superseded, I suggest you change that line (working on titanic) with `mutate(across(c(Class, Age, Sex, Survived), factor))`. — r2evans, Feb 02 '23 at 13:37
Finally, welcome to SO Stephen Magnus! I recommend re-taking the [tour], and spend a little more time thinking about what a _minimal_ reproducible question means. Namely, please reduce code that is not relevant to the question (in this case, everything after your first call to `mutate(.)` and certainly all of the titanic work). Long long long questions are often prompt close-votes or at least ignoring the question as "too big". See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for good discussions of this. Good luck! — r2evans, Feb 02 '23 at 13:40

score 0 · Answer 1 · answered Feb 02 '23 at 13:36

Troubleshooting problems like this means two things to me:

When you have one error/warning early in the code, discard any and all errors later in the code until that first one is resolved. In fact, don't even run code placed after the error. This would reduce the code for this question down to under 1/2 of what you've posted.
Once you know where the error occurs (in your first call to mutate), you need to look at the data before the erring code to see if it actually contains what you think it does. (It does not.) This means not just the presence of columns but also the class of them (e.g., character versus Date). You expect there to be a date column, yet it is not in the data and none of the code adds that column.

You start with floor_date(date), but at least my nottem has no such field:

nottem
#       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
# 1920 40.6 40.8 44.4 46.7 54.1 58.5 57.7 56.4 54.3 50.5 42.9 39.8
# 1921 44.2 39.8 45.1 47.0 54.1 58.7 66.3 59.9 57.0 54.2 39.7 42.8
# 1922 37.5 38.7 39.5 42.1 55.7 57.8 56.8 54.3 54.3 47.1 41.8 41.7
# ...
class(nottem)
# [1] "ts"

When you convert this to nottem_df, it does not have a date:

head(tibble(nottem))
# # A tibble: 6 × 1
#   nottem
#    <dbl>
# 1   40.6
# 2   40.8
# 3   44.4
# 4   46.7
# 5   54.1
# 6   58.5

Ultimately the first step in your problem is to derive a date from the data above. A time-series uses a few numbers to define its time-span.

attr(nottem, "tsp")
# [1] 1920.000 1939.917   12.000

Here, we start in year 1920 (at fractional-year 0, meaning Jan 1) and step 1/12th of a year up to and including 1939.917 (which is 20 years, 12 months per year). Let's convert that to dates.

nottem_df <- tibble(
  date = seq(as.Date("1920-01-01"), length.out = length(nottem), by = "month"), 
  temperature = nottem)
head(nottem_df)
# # A tibble: 6 × 2
#   date       temperature
#   <date>           <dbl>
# 1 1920-01-01        40.6
# 2 1920-02-01        40.8
# 3 1920-03-01        44.4
# 4 1920-04-01        46.7
# 5 1920-05-01        54.1
# 6 1920-06-01        58.5

The rest of your code on nottem works. For validation, see in the original matrix-looking form up top, 1920 and Mar intersect with a value of 44.4, which is what we have for "1920-03-01".

Now we have date.

Now, we can start to run the rest of your code.

nottem_tidy_df <- nottem_df %>%
  mutate(date = floor_date(date, unit = "year"),
         year = year(date),
         month = month(date)) %>%
  select(date, year, month, temperature)
nottem_tidy_df
# # A tibble: 240 × 4
#    date        year month temperature
#    <date>     <int> <int>       <dbl>
#  1 1920-01-01  1920     1        40.6
#  2 1920-01-01  1920     1        40.8
#  3 1920-01-01  1920     1        44.4
#  4 1920-01-01  1920     1        46.7
#  5 1920-01-01  1920     1        54.1
#  6 1920-01-01  1920     1        58.5
#  7 1920-01-01  1920     1        57.7
#  8 1920-01-01  1920     1        56.4
#  9 1920-01-01  1920     1        54.3
# 10 1920-01-01  1920     1        50.5
# # … with 230 more rows
# # ℹ Use `print(n = ...)` to see more rows

Can some one help me with an R script error, within my mutate() function

1 Answers1