0

I am pretty new to R and have a data file that represents a budget. I want to sum up all the price tags for one purpose in the purpose column. That purpose gets automatically factored when reading in the csv. But how can I assign the right prices to a purpose with several counts in the file and sum them up?

I got the file from this link: http://www.berlin.de/imperia/md/content/senatsverwaltungen/finanzen/haushalt/ansatzn2013.xls?download.html

I opened it in Open Office, exported the .csv-file and called it ausgaben.csv.

> ausgaben <- read.csv("ausgaben.csv")
> str(ausgaben)
'data.frame':   15895 obs. of  8 variables:
 $ Bereich         : Factor w/ 13 levels "(30) Senatsverwaltungen",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Einzelplan      : Factor w/ 28 levels "(01) Abgeordnetenhaus",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Kapitel         : Factor w/ 270 levels "(0100) Abgeordnetenhaus",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Titelart        : Factor w/ 1 level "Ausgaben": 1 1 1 1 1 1 1 1 1 1 ...
 $ Titel           : int  41101 41103 42201 42701 42801 42811 42821 44100 44304 44379 ...
 $ Titelbezeichnung: Factor w/ 1286 levels "Abdeckung von Geldverlusten",..: 57 973 182 67 262 257 95 127 136 797 ...
 $ Funktion        : Factor w/ 135 levels "(011) Politische Führung",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Euro            : Factor w/ 2909 levels "-1.083,0","-1.295,0",..: 539 2226 1052 1167 1983 1111 1575 2749 1188 1167 ...

In "Funktionen" are 135 levels which correspond to sums in "Euro". I want to get all the numbers in "Euro" for all their corresponding levels in "Funktionen" and sum them, so I get 135 Euro values and can show what is spent for what purpose in this budget.

EDi
  • 13,160
  • 2
  • 48
  • 57
r4gni
  • 21
  • 4
  • 1
    Please make your situation reproducible, i.e. provide us with the data and the code needed to mimic your situation. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for more tips on how to do this. – EDi Mar 18 '13 at 10:18
  • 1
    Welcome to Stack Overflow! Please give sample data or reproducible example so that good people here can help you better. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – CHP Mar 18 '13 at 10:18
  • Thank you for your patience, I hope this became more clear now. – r4gni Mar 18 '13 at 10:45
  • I think that 'Euro' should not be a factor... The rest could be done for example with plyr::ddply() oder ave(). – EDi Mar 18 '13 at 10:51
  • To read in the csv data, I would suggest using `read.csv2`, which should handle the values in `Euro` correctly (i.e., as numeric). – BenBarnes Mar 18 '13 at 13:32

2 Answers2

1

This could be done with plyr:::ddply or many other functions (ave, tapply, etc...). I think that 'Euro' should not be a factor, but numeric - so please fix this before trying to aggregate.

Since we do not have your data here is a toy example:

set.seed(1234)
df <- data.frame(fac = sample(LETTERS[1:3], 50, replace = TRUE),
                 x = runif(50))

require(plyr)
ddply(df, .(fac), summarise, 
      sum_x = sum(x))
#   fac    sum_x
1   A 7.938613
2   B 6.692007
3   C 5.645078
EDi
  • 13,160
  • 2
  • 48
  • 57
0

You can read the xls file with the gdata package:

library(gdata)
ausgaben <- read.xls("ansatzn2013.xls")

Firstly, you need to transform the values in the column Ansatz.2013.inkl..Nachtrag.in.Tsd..EUR from factor to numeric:

Euro <- as.character(ausgaben$Ansatz.2013.inkl..Nachtrag.in.Tsd..EUR)
Euro <- as.numeric(sub(",", "", Euro))

Then, you can calculate the sums with the aggregate function:

aggregate(Euro ~ ausgaben$Funktion, FUN = sum)
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • Unfortunately I got this error: `Fehler in model.frame.default(formula = Euro ~ ausgaben$Funktion) : Variablenlängen sind unterschiedlich (gefunden für 'ausgaben$Funktion')` – r4gni Mar 18 '13 at 14:06
  • @r4gni What is the output of `str(Euro)`? – Sven Hohenstein Mar 18 '13 at 14:29