1

So... I'm very illiterate when it comes to RStudio and I'm using this program for a class... I'm trying to figure out how to sum a subset of a category. I apologize in advance if this doesn't make sense but I'll do my best to explain because I have no clue what I'm doing and would also appreciate an explanation of why and not just what the answer would be. Note: The two lines I included are part of the directions I have to follow, not something I just typed in because I knew how to - I don't... It's the last part, the sum, that I am not explained how to do and thus I don't know what to do and would appreciate help figuring out.

For example,

I have this:

category_name    category2_name
1                ABC
2                ABC
3                ABC
4                ABC
5                ABC
6                BDE
5                EFG
7                EFG

I wanted to find the sum of these numbers, so I was told to put in this:

sum(dataname$category_name)

After doing this, I'm asked to type this in, apparently creating a subset.

allabc <- subset(dataname, dataname$category_name2 == "abc")

I created this subset and now I have a new table popped up with this subset. I'm asked to sum only the numbers of this ABC subset... I have absolutely no clue on how to do this. If someone could help me out, I'd really appreciate it!

jacketblox
  • 11
  • 2
  • you can try ```sqldf``` for multiple sum,mean,min,max. ```library(sqldf) allcategoryname2 <- sqldf('select allcategoryname, min(category_name) as MIN_category_name, max(category_name) as MAX_category_name, avg(category_name) as AVG_category_name, sum(category_name) as Sum_category_name from dataname group by allcategoryname') ``` – Tushar Lad Feb 21 '20 at 04:14
  • 2
    Is `category_name` numeric or non-numeric? If non-numeric, taking a sum() makes no sense. If numeric, subsetting on it using a character string also makes no sense. Please show a _minimal reproducible example_ – Edward Feb 21 '20 at 04:16
  • @Edward yes, category_name is a column of various rows of numbers, and the goal was to sum up all the row's numbers. As I said, I'm quite illiterate - what is a character string? – jacketblox Feb 21 '20 at 04:19
  • "categoryname2" is a character string (anything surrounded by a quote, can be single or double). ' or " – Edward Feb 21 '20 at 04:24
  • Okay, so Edward's point is salient: your attempt with `sum(dataname$category_name)` is going to return the sum of the categories themselves ... which is illogical. If you want to sum all of the values associated with each particular category, that's a different issue ... and one that is asked and answered many times over on SO. Look for `[r] aggregate` and you'll find many such answers, including https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group/1661144#1661144 – r2evans Feb 21 '20 at 04:24
  • 1
    @r2evans hmm I think I may not be just explaining it right. I'll edit it and make it more understandable. – jacketblox Feb 21 '20 at 04:27
  • Please also include the output to `dput(head(dataname))` or `str(dataname)` – Edward Feb 21 '20 at 04:28
  • @Edward I edited it and hopefully made it more clear. Does this help at all better? – jacketblox Feb 21 '20 at 04:36
  • Almost there! Two mistakes: R is case-sensitive, so... And your subset argument is not correct. Can you see where? – Edward Feb 21 '20 at 04:42
  • @Edward I apologize, as I'm not sure which R you are talking about. Also, what do you mean by my subset argument? Do you mean the function I inputted? If so, both functions I typed out were directions I had to follow... I really don't know what they mean or how they correlate unfortunately... – jacketblox Feb 21 '20 at 05:00

2 Answers2

1

R is the software you are using. It is case-sensitive. So "abc" is not equal to "ABC".

The arguments are the "things" you put inside functions. Some arguments have the same name as the functions (which is a little confusing at first, but you get used to this eventually). So when I say the subset argument, I am talking about your second argument to the subset function, which you didn't name. That's ok, but when starting to learn R, try to always name your arguments.

So,

allabc <- subset(dataname, dataname$category_name2 == "abc")

Needs to be changed to:

allabc <- subset(dataname, subset=category2_name == "ABC")

And you also don't need to specify the name of the data again in the subset argument, since you've done that already in the first argument (which you didn't name, but almost everyone never bothers to do that).

Edward
  • 10,360
  • 2
  • 11
  • 26
1

This is the most easily done using tidyverse.

# Your data
data <- data.frame(category_name = 1:8, category_name2 = c(rep("ABC", 5), "BDE", "EFG", "EFG"))

# Installing tidyverse
install.packages("tidyverse")

# Loading tidyverse
library(tidyverse)

# For each category_name2 the category_name is summed
data %>%
  group_by(category_name2) %>%
  summarise(sum_by_group = sum(category_name))

# Output
category_name2 sum_by_group
ABC            15
BDE            6
EFG            15
Esben Eickhardt
  • 3,183
  • 2
  • 35
  • 56