Working with merged column headers in Excel

Question

Possible Duplicate:
Read csv with two headers into a data.frame

I am new to R and working to use R to analyse some data. The data happens to be in Excel format and right now I'm struggling to find a way to convert it into a format that is R-friendly.

The issue is that the column headers have merged cells, so in effect the headers have two rows. I'd like to convert it into a normal set of 1-D vectors, add an extra column and a row. Let me explain with an example:

Currently the excel format looks thus:

  |  H  |  J  | 
Y |M |F |M |F | 
== == == == == 
Y1|V1|V2|V3|V4|

H,J are merged column headers and each of them span columns M and F.

The = indicate that the rows above are header rows

Given that H,J both are elements under, say R, I would like to convert this into a columnar format with a normal header and two rows, like this

Y |R |M |F |
== == == ==
Y1|H |V1|V2|
Y1|J |V3|V4|

Does anyone have an idea how to do this?

Does [this question](http://stackoverflow.com/q/11987103/1270695) help you get started? — A5C1D2H2I1M1N2O1R2T1, Aug 26 '12 at 16:57
Welcome to Stack Overflow! If you made a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that demonstrates your question / problem, we would find it easier to answer. — Andrie, Aug 26 '12 at 17:04
Thanks @mrdwab you're right, the question asked is the same as referred to in that post. I must have missed searching for it! — Romit, Aug 26 '12 at 17:22
@Romit, I've posted an answer for reference, but since you've mentioned that your problem is the same as the linked question, I'm recommending closing this question as a duplicate of the other one. — A5C1D2H2I1M1N2O1R2T1, Aug 26 '12 at 18:07

score 1 · Accepted Answer · edited May 23 '17 at 11:52

First, some assumptions:

The merged headings are on the first line of the CSV
The merged headings start in the second column of the CSV
The variable names in the second line of the CSV repeat (except for the variable in the first column)

Second, your data.

temp = c(",\"H\",,\"J\",", 
         "\"Y\",\"M\",\"F\",\"M\",\"F\"", 
         "\"Y1\",\"V1\",\"V2\",\"V3\",\"V4\"")

Third, a slightly modified version of this answer.

# check.names is set to FALSE to allow variable names to be repeated
ONE = read.csv(textConnection(temp), skip=1, check.names=FALSE,
               stringsAsFactors=FALSE)
GROUPS = read.csv(textConnection(temp), header=FALSE, 
                  nrows=1, stringsAsFactors=FALSE)
GROUPS = GROUPS[!is.na(GROUPS)]

# This can be shortened, but I've written it this way to show how
#   it can be generalized. For instance, if 3 columns were repeated
#   instead of 2, the rep statement could be changed to reflect that
names(ONE)[-1] = paste0(names(ONE)[-1], ".",
                        rep(GROUPS, each=(length(names(ONE)[-1])/2)))

Fourth, the actual reshaping of the data.

TWO = reshape(ONE, direction="long", ids=1, varying=2:ncol(ONE))
# And, here's the output.
TWO
#      Y time  M  F id
# 1.H Y1    H V1 V2  1
# 1.J Y1    J V3 V4  1

Working with merged column headers in Excel

1 Answers1