-2

I was wondering if there was a way to select specific columns via a sequence and create new variables from this.

So for example, if I had 8 columns with n observations, how could I create 4 variables that selects 2 rows sequentially? My dataset is much larger than this and I have 1416 variables with 62 observations each (I have pasted a link to the spreadsheet below, whereby the first column and row represent names). I would like to create new dataframes from this named as sites 1-12. So site 1 = df[,1:117]; site 2 = df [,119:237] etc.

I am planning on using this code for future datasets with even more variables so some form of loop or sequence function would be very effective if anyone could shed any light on how to achieve this?

https://www.dropbox.com/s/p1a5cu567lxntmw/MyData.csv?dl=0

Thank you in advance.

James

p.s @nrussell I have copied and pasted the output of the code you mentioned below, it follows on as a series of numbers like those displayed.

dput(z[ , 1:10]) structure(list(1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0.0311410340342049, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0207444023791158, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0312971643732546, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0376287494579976, 0, 0, 0, 0, 0, 0, 0),......... 10 = c(0, 0, 0, 0, 0.119280313679916, 0, 0, 0.301029995663981, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.715681882079494, 0.136831816210901, 0, 0, 0, 0.0273663632421801, 0, 0, 0, 0.0547327264843602, 0, 0, 0, 0, 0.0231561535126139, 0, 0, 0.0903089986991944, 0, 0, 0.0752574989159953, 0.159368821233872, 0.0272640716982664, 0.0177076468037636, 0, 0, 0.120411998265592, 0, 0, 0, 0, 0.0322532138211408, 0.0250858329719984, 0, 0, 0, 0.119280313679916, 0, 0.172922500085254, 0.225772496747986, 0, 0, 0, 0.0954242509439325, 0)), .Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame", row.names = c(NA, -62L))

joran
  • 169,992
  • 32
  • 429
  • 468
James White
  • 705
  • 2
  • 7
  • 20
  • 1
    Please add a sample of your actual dataset. If it has many rows and / or columns, you can use `dput(data[1:50,1:5])` (assuming `data` is a `data.frame` or `matrix`) for example. – nrussell Mar 17 '15 at 14:19
  • 1
    As @nrussell mentioned, adding a sample data and expected result will get you responses more quickly. At present, the data looks confusing. If the data you showed have 8 columns, then selecting every 2 columns, means 1:2, 3:4, 5:6, 7:8 or something else? You may need to check this link http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – akrun Mar 17 '15 at 15:00
  • 1
    If you have the data in R already, then run `dput(z[ , 1:10])` and paste the output into your question. Screenshots are not very useful at all. – nrussell Mar 17 '15 at 15:09
  • @nrussell I have edited the response above, I hope this is clear. Thank you for your help thus far. – James White Mar 17 '15 at 15:36

1 Answers1

1

We could split the dataset ('df') with '1416' columns to equal size '118' columns by creating a grouping index with gl

 lst <- setNames(lapply(split(1:ncol(df), as.numeric(gl(ncol(df), 118,
            ncol(df)))), function(i) df[,i]), paste0('site', 1:12))

Or you can create the 'lst' without using the split

 lst <- setNames(lapply(seq(1, ncol(df), by = 118), 
            function(i) df[i:(i+117)]), paste0('site', 1:12))

If we need to create 12 dataset objects in the global environment, list2env is an option (I would prefer to work within the 'lst' itself)

 list2env(lst, envir=.GlobalEnv)

Using a small dataset ('df1') with '8' columns

  lst1 <- setNames(lapply(split(1:ncol(df1), as.numeric(gl(ncol(df1), 
         2, ncol(df1)))), function(i) df1[,i]), paste0('site', 1:4))
  list2env(lst1, envir=.GlobalEnv)

  head(site1,3)
  #  V1 V2
  #1  6 12
  #2  4  7
  #3 14 14

 head(site4,3)
 #  V7 V8
 #1 10  2
 #2  5  4
 #3  5  0

data

set.seed(24)
df1 <- as.data.frame(matrix(sample(0:20, 8*10, replace=TRUE), ncol=8))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • this seems to be on the right lines, when I type str(lst) it appears to have divided it into the correct amount of rows. Is there a way a selecting a specific site within this from here (i.e if I wanted Site 7 as a seperate variable). Site_7 <- lst$site7 registered but displayed only a null value. Thank you so much for your help! – James White Mar 17 '15 at 15:42
  • @JamesWhite The `list2env` will create the object in the global space. Type `site7` on the console after you run the `list2env` – akrun Mar 17 '15 at 15:46
  • Yes this has worked! You're a life saver, thank you for all of your help @akrun – James White Mar 17 '15 at 15:49