0

I have a data frame of factor variables columns and one numeric column and one integer column. I want to aggregate the data to unique factor variable combinations and sum up the counts. With aggregate I get the unique factor combi's but it doesn't sum up the count column.

Data dataframe test2

        Afdeling_1         Probleemgebied               Locatie.niveau.1            Risico count    cost
1  Secondairy assembly Complete transformator         Secundaire installatie  Risico 3 ( hoog)     1      NA
2          Active part             Binnenwerk                    Actief deel  Risico 3 ( hoog)     1      NA
3         Construction Complete transformator         Secundaire installatie Risico 2 (midden)     1      NA
4       Final assembly          Complete kast         Complete transformator  Risico 3 ( hoog)     1      NA
5          Windingshop              Wikkeling                      Wikkeling  Risico 3 ( hoog)     1      NA
6            Warehouse Complete transformator Niet transformator gerelateerd   Risico 1 (laag)     1      NA
7          Active part         Wikkelingenset                 Wikkelingenset   Risico 1 (laag)     1      NA
8            Warehouse Complete transformator         Secundaire installatie Risico 2 (midden)     1      NA
9         Pre assembly          Complete kast  Kastbodem (altijd onderzijde) Risico 2 (midden)     1      NA
10        Pre assembly          Complete kast  Kastbodem (altijd onderzijde) Risico 2 (midden)     1      NA
11      Final assembly Complete transformator                    Conservator  Risico 3 ( hoog)     1      NA
12      Final assembly Complete transformator         Complete transformator  Risico 3 ( hoog)     1      NA
13        Pre assembly          Complete kast                    Leidingwerk Risico 2 (midden)     1      NA
14                 KAM             Binnenwerk                    Actief deel   Risico 1 (laag)     1      NA
15      Final assembly          Complete kast         Complete transformator  Risico 3 ( hoog)     1      NA
16        Pre assembly          Complete kast                        Koeling Risico 2 (midden)     1      NA
17                 KAM         Wikkelingenset                 Wikkelingenset   Risico 1 (laag)     1      NA
18      Final assembly             Binnenwerk                    Actief deel  Risico 3 ( hoog)     1      NA
19         Active part         Wikkelingenset                 Wikkelingenset   Risico 1 (laag)     1      NA
20         Active part         Wikkelingenset                 Wikkelingenset  Risico 3 ( hoog)     1      NA
21         Active part             Binnenwerk                    Actief deel  Risico 3 ( hoog)     1      NA
22 Secondairy assembly Complete transformator         Secundaire installatie  Risico 3 ( hoog)     1      NA
23                 KAM          Complete kast                        Koeling Risico 2 (midden)     1      NA
24      Final assembly          Complete kast                       Kastwand   Risico 1 (laag)     1      NA
25        Pre assembly          Complete kast  Kastbodem (altijd onderzijde)  Risico 3 ( hoog)     1      NA
26        Construction             Binnenwerk                    Actief deel   Risico 1 (laag)     1      NA
27      Final assembly Complete transformator Niet transformator gerelateerd  Risico 3 ( hoog)     1      NA
28      Final assembly             Binnenwerk                    Actief deel   Risico 1 (laag)     1      NA
29      Spoelenmontage         Wikkelingenset                 Wikkelingenset   Risico 1 (laag)     1      NA
30         Active part                   Kern                           Kern  Risico 3 ( hoog)     1 1820.00
31         Active part                   Kern                           Kern   Risico 1 (laag)     1      NA
32         Windingshop              Wikkeling                      Wikkeling Risico 2 (midden)     1      NA
33      Final assembly Complete transformator                       Kastwand  Risico 3 ( hoog)     1      NA
34         Active part         Wikkelingenset                 Wikkelingenset Risico 2 (midden)     1 1407.36
35         Active part         Wikkelingenset                 Wikkelingenset   Risico 1 (laag)     1      NA
36         Active part         Wikkelingenset                 Wikkelingenset   Risico 1 (laag)     1      NA
37 Secondairy assembly Complete transformator         Secundaire installatie  Risico 3 ( hoog)     1      NA
38           Warehouse Complete transformator Niet transformator gerelateerd Risico 2 (midden)     1      NA
39           Warehouse Complete transformator                       Kastwand Risico 2 (midden)     1      NA
40      Final assembly          Complete kast  Kastbodem (altijd onderzijde)  Risico 3 ( hoog)     1      NA
41           Warehouse Complete transformator Niet transformator gerelateerd  Risico 3 ( hoog)     1      NA
42           Warehouse Complete transformator Niet transformator gerelateerd  Risico 3 ( hoog)     1      NA
43      Spoelenmontage              Wikkeling                      Wikkeling   Risico 1 (laag)     1      NA
44    Sales & projects Complete transformator         Complete transformator  Risico 3 ( hoog)     1      NA
45         Active part             Binnenwerk                    Actief deel   Risico 1 (laag)     1      NA
46                 KAM Complete transformator         Complete transformator Risico 2 (midden)     1      NA
47         Windingshop              Wikkeling                      Wikkeling   Risico 1 (laag)     1      NA
48    Sales & projects Complete transformator         Complete transformator   Risico 1 (laag)     1      NA
49         Active part             Binnenwerk                    Actief deel   Risico 1 (laag)     1      NA
50        Pre assembly          Complete kast                        Koeling  Risico 3 ( hoog)     1      NA

This is the code I used:

aggregate(count ~., test2, sum)

This is the result I get:

  Afdeling_1 Probleemgebied Locatie.niveau.1            Risico    cost count
1 Active part Wikkelingenset   Wikkelingenset Risico 2 (midden) 1407.36     1
2 Active part           Kern             Kern  Risico 3 ( hoog) 1820.00     1

I see the underlying problem is not that aggregate doesn't work, but that the cost column has NA's and that the data is aggregated to only records with a cost. Thus I need a way that the NAs are not omitted.

PDG
  • 287
  • 1
  • 3
  • 14
  • What is the result you are getting? You can just do `aggregate(count ~., test, sum)` btw – David Arenburg Nov 10 '16 at 12:00
  • I don't use ~. because the original data has 35 columns and I only want to use 18 for this formula... however I could ofcourse subset them. The result that I get is the uniques but WITHOUT the count variable summed up behind the uniques – PDG Nov 10 '16 at 12:03
  • 1
    Please provide a re producible example then. We can't reproduce this using the output of `str(test)`. See [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – David Arenburg Nov 10 '16 at 12:20
  • @DavidArenburg I just edited the post to a reproducible one. Sorry for that, it's been a while. – PDG Nov 10 '16 at 12:54

0 Answers0