-3

I have two csv files. File one (consumption.scv) has the household identifier number in the header and hourly consumption values in each row. Each column represents another hosuehold. File two is one column with only household numbers which I would like to include.
I would like to produce a file which only includes the consumption time series of households which are included in file b.

consumption<-read.csv householdno<-read.csv

I am stuck with the following: consumption_new<-consumption[,c(xxxxxx)]

Thanks a lot for your help!

maaar
  • 1
  • 1

1 Answers1

1

Since you haven't included reproducible example, I had to create one:

set.seed(123)
consumption <- matrix(floor(runif(26*3, 10, 30)), nrow=3)
colnames(consumption) <- LETTERS
householdno <- data.frame(houses=sample(LETTERS, 5))
consumption[, colnames(consumption) %in% householdno[,1]]
#       C  F  J  P  Z
# [1,] 20 27 21 12 14
# [2,] 27 14 15 14 17
# [3,] 21 10 12 19 22

The trick is to use logical vector to subset columns from data.frame. TRUE includes column, FALSE excludes it.

%in% will check whether every element in first vector exists in second vector. It returns logical vector of length of first vector.

Finishing touch is to use that logical vector to subset desire columns from data.frame. This would be more readable if you decided to store that vector in variable.

Community
  • 1
  • 1
Mirek Długosz
  • 4,205
  • 3
  • 24
  • 41
  • Hi @maaar, if my answer helped you to solve your problem, please consider [marking it as accepted](https://meta.stackexchange.com/a/5235/312562). – Mirek Długosz May 14 '17 at 22:27