I have a dataframe (sy2.1) with two columns (Vannstand2Positive and Vannstand2Negative) that both contain NA's as a result of mutating the columns from an existing column. I also have a third column (t) with time and date, and a fourth column (Kategori) with 4 categories. I want to find the percentage of each of the 4 categories with values from Vannstand2Positive and Vannstand2Negative.
This is the code I have used, in which I only use the column with positive values (Vannstand2Positive) in this code.
sy2.1 %>%
filter(year(t)==2005) %>%
group_by(Kategori, year(t)) %>%
dplyr::summarise(sum_vannstandPos = sum(Vannstand2Positive)) %>%
dplyr::mutate(percent_vannstandPos=sum_vannstandPos/sum(sum_vannstandPos)*100)
# Kategori year(t) sum_VannstandPos percent_VannstandPos
# 1: Liten 2005 NA NA
# 2: Moderat 2005 1107 100
# 1: Stor 2005 609 100
# 2: Størst 2005 1107 100
There is clearly something wrong with the code, as the percentage gives 100% on all the categories, but I need the correct calculations of the percentage. I tried to also use it on the column with negative values (Vannstand2Negative), but I only got NA's on the table. But I know its fully possible to find percentages of negative values. Anyone with a code that can fix this?
By adding na.rm=T to the sum function, I got percentages of almost all groups except the first group:
# Kategori year(t) sum_VannstandPos percent_VannstandPos
# 1: Liten 2005 NA NA
# 2: Moderat 2005 1107 32.34005
# 1: Stor 2005 609 17.79141
# 2: Størst 2005 1107 49.86854