how to extract two columns of data using R by loop

Question

I have a dataframe with 1000 columns of data

 str(MT)
 'data.frame':  1356 obs. of  1000 variables:
 $ Date : Factor w/ 1356 levels "Apr-1900","Apr-1901",..: 453 340 792 1 905  679 566 114 1244 1131 ...
 $ Year : int  1900 1900 1900 1900 1900 1900 1900 1900 1900 1900 ...
 $ X1   : num  -27.4 -27.8 -17 1.7 7.9 ...
 $ X2   : num  -27.21 -27.99 -17.05 1.69 7.75 ...
 $ X3   : num  -26.67 -27.84 -16.75 2.24 7.82 ...
 $ X4   : num  -26.64 -27.98 -16.83 2.46 7.97 ...
  .....
 $ X1000  : num  -29.13 -30.61 -20.47 -0.46 6.5

I would like to split this dataframe into three columns ( Date, Year and Xn) using a loop so that the end of it all I will have 1000 separate csv files with 3 columns of data. My codes thus far is

for (i in ncol(MT)) {
x[[i]]<-data.frame(MT$Date, Year, MT$[[i]]) }

However, is giving me errors. Your guidance would be appreciated as this I am new to R

Reshape from wide to long, then save it out in chunks.Here's how to do the first part: http://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format and — Neal Fultz, Jun 02 '15 at 00:29

score 1 · Accepted Answer · answered Jun 02 '15 at 00:44

Your code has some syntax and algorithm errors:

Your for loop is not looping through a range of values, it's "looping" once for i = ncol(MT), it should be (i in 1:ncol(MT)) ;
Actually, you shouldn't loop through all columns, since two of them aren't Xn, so (i in 1:(ncol(MT)-2));
It's not clear if you did, but you should create x before trying to allocate data to it, preferably with its final size;
You didn't use MT$ to select the Year column;
You used both $ and [[ to subset the Xn column. You should use just [ instead, because this way you get to use i and keep the column name.

Fixing all these, with some example data, you get:

MT <- data.frame(Date = rnorm(5), Year = rnorm(5), X1 = rnorm(5), X2 = rnorm(5), X3 = rnorm(5))

nX <- ncol(MT)-2

listofdf <- lapply(1:nX, function(x) NULL)

for (i in 1:nX) {
  listofdf[[i]] <- data.frame(MT$Date, MT$Year, MT[i+2])
}

listofdf
# [[1]]
# MT.Date    MT.Year         X1
# 1 -0.94184053  1.0241134 -0.4329728
# 2  0.59637577 -0.6195477 -1.3011527
# 3  0.33474278  1.0628674 -0.8957239
# 4 -0.04328685  0.4275993 -0.7840214
# 5  0.78799652  0.5707058 -0.4243622
# 
# [[2]]
# MT.Date    MT.Year         X2
# 1 -0.94184053  1.0241134  2.2380838
# 2  0.59637577 -0.6195477 -0.9995170
# 3  0.33474278  1.0628674  0.3452450
# 4 -0.04328685  0.4275993 -1.0453718
# 5  0.78799652  0.5707058 -0.6292885
# 
# [[3]]
# MT.Date    MT.Year          X3
# 1 -0.94184053  1.0241134 -0.05293727
# 2  0.59637577 -0.6195477  0.84947635
# 3  0.33474278  1.0628674  1.17748809
# 4 -0.04328685  0.4275993  1.73233398
# 5  0.78799652  0.5707058 -0.61874653

If you're just going to save them as .csv files, it's not necessary to store in a list though. Instead, you can use:

for (i in 1:nX) {
  tempdf <- data.frame(MT$Date, MT$Year, MT[i+2])
  write.csv(tempdf, paste0("MT_subset_X", i, ".csv"))
}

thanks you as well for the gudiance on sytax erro, it help to clear up lots of stuff! It is really appreciated! — Joan, Jun 02 '15 at 16:55
You're welcome. My advice is that you test every function call one by one when writing the code (the console is great for that), this way you're able to pick up syntax erros as you write your code and also check if the output is what you expected. — Molx, Jun 02 '15 at 17:16

score 0 · Answer 2 · answered Jun 02 '15 at 03:34

Reusing the sample data created by @Molx, and doing some reshaping as @Neal Fultz suggested in comments, using tidyr

# generate sample data
MT <- data.frame(Date = rnorm(5), Year = rnorm(5), X1 = rnorm(5), X2 = rnorm(5), X3 = rnorm(5))

Then fit all variables and values excluding Date and Year as key-value column pairs

> require(tidyr)
> MTg <- gather(MT, var, value, -c(Date, Year))
> MTg
         Date       Year var       value
1  -1.5356474 -1.0963886  X1 -0.74075807
2  -1.1346928  0.2925819  X1  1.42787059
3   0.7031032  0.3361561  X1 -0.27112156
4   1.0140557  1.2587298  X1  0.85693377
5   0.2529787 -3.0113663  X1  0.12686607
6  -1.5356474 -1.0963886  X2  0.21406288
7  -1.1346928  0.2925819  X2 -1.11363330
8   0.7031032  0.3361561  X2 -0.30324978
9   1.0140557  1.2587298  X2  0.48954893
10  0.2529787 -3.0113663  X2  0.85898166
11 -1.5356474 -1.0963886  X3 -0.44394680
12 -1.1346928  0.2925819  X3 -0.86942530
13  0.7031032  0.3361561  X3 -1.62344294
14  1.0140557  1.2587298  X3  0.09880026
15  0.2529787 -3.0113663  X3 -0.76091871

Then run through all possible variable names, exporting them into individual csv files with same name as var.

varnames <- levels(MTg$var)  # get variable names
dummy <- lapply(varnames, function(x)
  write.csv(MTg[MTg$var==x,], file=paste0(x, ".csv"))

how to extract two columns of data using R by loop

2 Answers2