1

I have tried several ways to do this particular thing but unsuccessfully. Basically I have a huge dataframe with 6 columns and +8000 rows. The first column has a 2-code lettering (e.g. AA,then AB, and so on...). The second column has some numeric value that are common to the othe 4 columns (e.g. from 0 to 180 for AA, 0 to 170 for AB, and so on). The the other columns are values.

This is an extract of my DataFrame:

 structure(list(X2code = c("AA", "AA", "AA", "AA", "AA", "AA", 
"AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", 
"AA", "AA", "AA", "AA", "AA", "AA", "AB", "AB", "AB", "AB", "AB", 
"AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", 
"AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", 
"AB", "AB", "AB", "AB", "AB", "AB", "AC", "AC", "AC", "AC", "AC", 
"AC", "AC", "AC", "AC", "AC", "AC", "AC", "AC", "AC", "AC", "AC"
), DOY = c(294L, 295L, 296L, 297L, 298L, 299L, 300L, 301L, 302L, 
303L, 304L, 305L, 306L, 307L, 308L, 309L, 310L, 311L, 312L, 313L, 
314L, 315L, 316L, 294L, 295L, 296L, 297L, 298L, 299L, 300L, 301L, 
302L, 303L, 304L, 305L, 306L, 307L, 308L, 309L, 310L, 311L, 312L, 
313L, 314L, 315L, 316L, 317L, 318L, 319L, 320L, 321L, 322L, 323L, 
324L, 325L, 326L, 168L, 169L, 170L, 171L, 172L, 173L, 174L, 175L, 
176L, 177L, 178L, 179L, 180L, 181L, 182L, 183L), WDrain = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 
0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 
0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 
0.13, 0.13, 0.13, 0.13, 244.1, 244.1, 244.1, 244.1, 244.1, 244.1, 
244.1, 244.1, 244.1, 244.1, 244.1, 244.1, 244.1, 244.1, 244.1, 
244.1), CumET = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3.34, 4.75, 5.33, 
6.67, 7.41, 7.84, 8.78, 9.59, 10.47, 10.57, 11.17, 11.91, 12.07, 
12.48, 12.63, 12.88, 13.8, 14.56, 15.11, 15.43, 15.86, 16.66, 
17.27, 17.54, 18.21, 18.64, 18.75, 19.11, 19.2, 19.85, 20.48, 
21.02, 21.32, 222, 226, 233, 241, 250, 258, 265, 269, 271, 276, 
279, 281, 281, 283, 285, 288), SoilAvW = c(NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 246.51, 245.1, 249.52, 248.68, 248.04, 247.44, 246.5, 
245.69, 244.81, 244.71, 244.11, 243.37, 243.3, 242.88, 242.83, 
242.58, 241.66, 241, 243.65, 243.5, 243.36, 242.65, 249.03, 250.74, 
253.05, 266.21, 270.28, 279.71, 287.9, 288.84, 288.69, 288.25, 
295.13, 330.2, 326, 319.5, 311.2, 302.8, 294.4, 287.7, 287.2, 
285, 280.4, 278.6, 276, 282.3, 286.5, 284.1, 281.5), Runoff = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), Transp = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01, 0.01, 0.01, 
0.01, 0.01, 0.02, 0.02, 0.02, 0.02, 0.03, 0.03, 0.03, 0.03, 0.04, 
0.04, 0.04, 0.04, 0.04, 0.05, 0.05, 105.7, 109.1, 114.5, 121.5, 
128.9, 136.2, 141.9, 145.4, 147.1, 150.7, 152.9, 154.8, 155, 
155.8, 157.9, 159.9)), .Names = c("X2code", "DOY", "WDrain", 
"CumET", "SoilAvW", "Runoff", "Transp"), class = "data.frame", row.names = c(NA, 
-72L))

What I would like to do is to create 4 new data frames where in each dataframe I want to break the long column in a way that AA, AB, and so on will be new columns. In addition next to the values I want the values of column 2 that goes with each 2-code. For example, AA will have the column 0-to-180, then the values, AB will have column value 0-to-170 then its values. Moreover, on top of the column I would like to put the 2 letter code for both columns. Here is an example of the dataframe I would like to have (for example this is for value1).

AA  AA  AB  AB  AC  AC

0   2   0   0.5 0   50
1   2.4 1   1.6 1   0.6
2   5   2   4.6 ..  ..
3   6.7 3   2   ..  ..
..  ..  ..  ..      
..  ..  170 70      
180 10          

And below part of my code:

  setwd("C:\\.....")

    my.data <- read.table("my.data.txt", header=T, na.strings = c("na" , "n/a" , "NA" , "") , stringsAsFactors = F)

   for(id in my.data$2.code)
     {

# here I take the columns 2 to 3 in the data, the first col is the ID that is used                                                 # to let R understand that for each ID needs to do a certain operation

       data.code <- my.data[my.data$2.code == id, 2:3] 

  # And now my is my problems... I tried with DOBY, and other operations but I could not do what I want!!  
       write.table(....., "trials.txt", quote=F, col.names=FALSE, row.names= F)


           }
david
  • 113
  • 1
  • 8
  • 1
    Please read [this](http://stackoverflow.com/a/5963610/1412059) and then provide data. – Roland Apr 19 '13 at 14:12
  • thank you for the tip. I did not how to do it. I subset my actual data and I used dput() function. I put the example in my edited question. Thank you again and please let me know if it is still not good!I need to improve my overall understanding of how these things work. – david Apr 19 '13 at 14:35
  • 1
    I still cannot see your sample data. You can also do `dput(df[1:20,])` and just show 20 lines of your sample data. – Jd Baba Apr 19 '13 at 14:47
  • 1
    @david please copy and paste the actual output that you get when you run the command `dput( my.data[1:20,] )` at the R command line into the question above. It will help ***massively*** to be able to reconstruct a small portion of your data on our systems. We can't run through your example code without it. – Simon O'Hanlon Apr 19 '13 at 15:18
  • is this fine? I hope it helps. – david Apr 19 '13 at 15:28
  • @david yes - now I am sure you will get help! and a +1 for a reproducible example!# – Simon O'Hanlon Apr 19 '13 at 15:31
  • I don't understand your desired filtering, but as a first step I recommend creating a `list` for your output variable. Name each element of the list `AA`,`AB`, etc., and then place a 2-column matrix in each element which contains the desired output of your filtering operations. – Carl Witthoft Apr 19 '13 at 15:34
  • I would recommend using `split` on your dataframe. It doesn't give you *exactly* the desired output, but you might find it more intuitive? Try `split( my.data , my.data$X2code )`. You will get a list of dataframes, one for each variable in X2code, with the DOY as the first column and then all other values in the other columns. I am not sure which values you otherwise want to keep? If you want to do something with the dataframes afterwards you can use a `lapply` function on the list of dataframes returned to do something to each one in turn. Anyway, see if `split` helps. – Simon O'Hanlon Apr 19 '13 at 15:39

1 Answers1

2

I recommend to leave your data.frame as it is since it is in the most sensible format for further processing in R. However, here are some possibilities, which do not exactly achieve your desired output (not easily possible), but might still be helpful.

library(reshape2)
WDrain.wide <- dcast(DF,DOY~X2code,value.var='WDrain')

library(plyr)
WDrain.strange <- dlply(DF,.(X2code),function(x) x[,2:3])

Or as suggested by @SimonO101:

split(DF[,-1],DF[,1])
Roland
  • 127,288
  • 10
  • 191
  • 288
  • ok, I have tried these solutions, and they work fine. Thanks everyone! But, now I have the issue of putting things together. I have thought about something but I lack the technical skills to implement them in R. Basically, I could ignore the DOY column and use the same one for all the variables.Of course every "2code" has different length. However, if I create a matrix with NAs that have the length of the longer variable, and after the split I fill a matrix? – david Apr 20 '13 at 15:13
  • I started writing the code but I am finding troubles in filling the matrix. I checked on other blogs how to do it but I do not get much satisfactory answers. The matrix could be my output DF that I described above, the difference that I have only one col for DOY (since they all start form 0, but the length is different). Anyway, that is my idea: `mat.data <- matrix(data=NA,nrow=308,ncol=27) mysplit <-split(my.data[,2], my.data[,1]) for(i in 1: length(my.data$2.code)) # I hope this is right... { mysplit[[i]] # How to fill the matrix properly? }` – david Apr 20 '13 at 15:14
  • Also, if want to give to each column the name of the 2.code (e.g. the first col AA, the second AB and so on..that is easy isn't it? I shall use "paste()" Thank you in advance again! – david Apr 20 '13 at 15:15