If I understand correctly, and if your source data really are nicely formatted you might be able to do something like the following. Here, I'm linking to a csv with three sets of currencies which replicates what I think your source data look like.
First, read the file in using read.csv
but skip the first row. Use check.names = FALSE
so that duplicate column names are allowed.
temp <- read.csv("http://ideone.com/plain/t3cGcA",
header = TRUE, skip = 1,
check.names = FALSE)
temp
# Date Contract Last Open High Low Volume Date
# 1 10/10/2012 Dec 100 101 105 99 20000 NA 10/10/2012
# 2 10/11/2012 Dec 101 102 106 98 20100 NA 10/11/2012
# 3 10/12/2012 Jan 102 103 107 97 20120 NA 10/12/2012
# Contract Last Open High Low Volume
# 1 Dec 50 52 49 99 20530
# 2 Dec 53 56 43 98 24300
# 3 Jan 56 52 48 97 22320
# structure(c("NA", "NA", "NA"), class = "AsIs") Date Contract
# 1 NA 10/10/2012 Dec
# 2 NA 10/11/2012 Dec
# 3 NA 10/12/2012 Jan
# Last Open High Low Volume
# 1 500 501 605 99 20000
# 2 600 502 606 98 20100
# 3 700 503 607 97 20120
Second---and here is one assumption of the tidiness of your dataset---use seq
to create a vector of where your blank columns are. From this, if our assumption of tidiness is correct, you can use simple math to determine the start (vector value minus 7) and end indexes (vector value minus 1) of each currency.
myblankcols <- seq(1, ncol(temp), by=8) + 7
myblankcols
# [1] 8 16 24
Using the simple math mentioned above, create a list
of the subsets of each currency, and add names to the list. You can get the names by re-reading just the first line of the file as a csv and dropping all the NA
values.
tempL <- lapply(seq_along(myblankcols),
function(x) temp[(myblankcols[x] - 7):(myblankcols[x] - 1)])
NamesTempL <- read.csv("http://ideone.com/plain/t3cGcA",
header = FALSE, nrows = 1)
names(tempL) <- NamesTempL[!is.na(NamesTempL)]
tempL
# $`Currency 1`
# Date Contract Last Open High Low Volume
# 1 10/10/2012 Dec 100 101 105 99 20000
# 2 10/11/2012 Dec 101 102 106 98 20100
# 3 10/12/2012 Jan 102 103 107 97 20120
#
# $`Currency 2`
# Date Contract Last Open High Low Volume
# 1 10/10/2012 Dec 50 52 49 99 20530
# 2 10/11/2012 Dec 53 56 43 98 24300
# 3 10/12/2012 Jan 56 52 48 97 22320
#
# $`Currency 3`
# Date Contract Last Open High Low Volume
# 1 10/10/2012 Dec 500 501 605 99 20000
# 2 10/11/2012 Dec 600 502 606 98 20100
# 3 10/12/2012 Jan 700 503 607 97 20120
I am usually tempted to stop at this point, because I find lists convenient for many purposes. But, it's equally easy to convert it to a single data.frame
. This is also one of the reasons to make sure you use check.names = FALSE
in the first step: if all the columns have the same name, then there will be no problem rbind
ing them together.
do.call(rbind, tempL)
# Date Contract Last Open High Low Volume
# Currency 1.1 10/10/2012 Dec 100 101 105 99 20000
# Currency 1.2 10/11/2012 Dec 101 102 106 98 20100
# Currency 1.3 10/12/2012 Jan 102 103 107 97 20120
# Currency 2.1 10/10/2012 Dec 50 52 49 99 20530
# Currency 2.2 10/11/2012 Dec 53 56 43 98 24300
# Currency 2.3 10/12/2012 Jan 56 52 48 97 22320
# Currency 3.1 10/10/2012 Dec 500 501 605 99 20000
# Currency 3.2 10/11/2012 Dec 600 502 606 98 20100
# Currency 3.3 10/12/2012 Jan 700 503 607 97 20120
I'll definitely stop here, but from here, you probably want to convert your "Date" column to actual columns, and perhaps convert the row names ("Currency 1.1", "Currency 1.2", and so on) to a column in your data.frame
.