-2

Working on my second project for R. I'm trying to create some variable groups using dplyr, but I'm not sure what the heck I'm doing here.

I'm working with financial data and among the categories, there are several different forms of travel, listed as such:

Travel - Gas, Travel - Airfare, Travel - Subway...

I want to create a new tibble that groups all the Travel subtypes as one Travel subgroup. Is there a good way to do this?

I've been trying to use the dplyr filter function to no effect so far.

Sorry, I was really tired and forgot to put an example up

I have data that's like this:

Month - Year - Category - Amount
01 - 2016 - "Travel- Air" - 247.02
01 - 2016 - "Travel- Car" - 29.04
01 - 2016 - "Retail"      - 45.00
03 - 2017 - "Travel - Air" - 253.60

I'm trying to group things so that all the transactions of one type in a particular month/year are summed together like this:

Total_Category_Transactions_Month <- Total_Transactions %>%
    group_by(month,Year,Category) %>%
    summarize(monthly = sum(Amount))

But after looking at my data, there are just way too many things that are grouped up as "Travel - foo." I'd like to keep that detail for later to analyze, but for the big scale picture, I want to see if I can lump all those Travel expenses as one thing each month.

The Output should end up being:

   Month - Year - Category - Amount
    01  - 2016  - "Travel"  - 276.06
    01  - 2016  - "Retail"  - 45.00
    03  - 2017  - "Travel"  - 253.60

Where all the subtypes of category Travel_Foo from the same month and year are added together into one category just called Travel

OfTheAzureSky
  • 83
  • 1
  • 8
  • Use regular expression like `stringr::str_detect` or `grep`. eg https://stackoverflow.com/questions/22850026/filtering-row-which-contains-a-certain-string-using-dplyr – A. Suliman Jan 27 '19 at 08:55
  • 1
    Your question is unclear, please read and edit your question according to: [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – pogibas Jan 27 '19 at 09:00
  • Sorry, I've edited my post for clarity. – OfTheAzureSky Jan 27 '19 at 09:02
  • 1
    I struggle to understand what you're trying to do, and your sample data is not quite clear to me (the `-` symbol seems to denote a column separator but can also be part of the strings in column `Category`?). These ambiguities can be avoided if you use `dput` to provide (part of your) data; also include your expected output for the sample data you give. – Maurits Evers Jan 27 '19 at 11:41
  • I've edited again. This will teach me to post stuff while I'm tired. – OfTheAzureSky Jan 27 '19 at 14:55

1 Answers1

0

One option is to use tidyr::separate

df %>%
    separate(Category, into = c("Category"), extra = "drop") %>%
    group_by(Month, Year, Category) %>%
    summarise(Amount = sum(Amount)) %>%
    ungroup() %>%
    as.data.frame()
#  Month Year Category Amount
#1     1 2016   Retail  45.00
#2     1 2016   Travel 276.06
#3     3 2017   Travel 253.60

Note that the as.data.frame() is not really necessary here. I've only included it to show that the resulting Amounts are the ones from your expected output (the tibbles don't print the same number of decimal places).


Sample data

df <- read.table(text =
    "Month  Year  Category  Amount
01  2016  'Travel- Air'  247.02
01  2016  'Travel- Car'  29.04
01  2016  'Retail'       45.00
03  2017  'Travel - Air'  253.60", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68