-2

I am currently working on airmass trajectories for 11 different stations all over the city for one year. For each station I have dataframes of 72-hour trajectories that looks like this

      date      lon/lat
    yymmddhh_1   lon_1
    yymmddhh_1   lat_1
    yymmddhh_1   lon_2
    yymmddhh_1   lat_2
    yymmddhh_1   lon_3
    yymmddhh_1   lat_3

I didn't put the longitude and latitude values in separate columns because I need them to be in one for my analysis.

The date column starts with a certain day (in my case 011022: 22/10/2001) and goes backwards for 72 hours in 1-hour steps, leaving me with 146 separate lon/lat values. I have trajectories for 329 days, so the dimension of the dataframe is dim=48180 x 2.

Now I need a new dataframe where the columns are my backward timesteps (t-0, t-1, t-2,...,t-72) and each row represents one trajectory (yymmddhh_1,yymmddhh_2,...,yymmddhh_329).

   date       t-0     t-0     t-1     t-1
yymmddhh_1   lon_1   lat_1   lon_2   lat_2
yymmddhh_2   lon_1   lat_1   lon_2   lat_2
yymmddhh_3   lon_1   lat_1   lon_2   lat_2

So I think my code needs to read column 2 of my current dataframe up to row=146, write these values in the first row of my new dataframe, and repeat the process until the end of the dataframe is reached.

I already managed to do that for the first 146 values, which is rather easy because I just need to

trajectory_1 <- t(station.trajectory[1:146,2]) 

I also already created the date column.

Maybe I can use read.table? I really have no idea where to start with this, so any help would be highly appreciated.

EDIT: To clear things up, here's an example of what the current dataframe looks like, and what the new one should look like:

[,1] is the date (format: YYMMDDHH), [,2] are the lon, lat values

        [,1]   [,2]

[1,] 2071000 525500

[2,] 2071000 133300

[3,] 2070923 524918

[4,] 2070923 134759

[5,] 2070922 524238

[6,] 2070922 136058

...

[146,] 2070700 140147

[147,] 2071100 525500

[148,] 2071100 133300

[149,] 2071023 525142

[150,] 2071023 128926

Note that at [147,] a new trajectory for the day following [1,] begins.

Keeping the content of[,1] is not important here, what my code should to in the end, is take [,2] and make it look like this :

      [,1]      [,2]     [,3]     [,4]     [,5]
[1,] 2071000   525500   133300   524918   134759
[2,] 2071100    ...      ...      ...       ...

EDIT 2: I also should add that I am trying to prepare my data for the k mean clustering (http://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html). Maybe I am not understanding the manual properly, but to me it looks like each trajectory should have its own row...

EDIT 3:

I tried writing a loop to do the work.

ind1<- matrix()
ind1 <- cbind(seq(0,48034,146))
ind1[1,] <- 1

First I created an index to have steps of 146. My final dataframe shall be named beusselstr.dataframe

beusselstr.dataframe <- NULL
k<- NULL

The station "beusselstr" only has 115 days, so I want to use only the first 115 index values until 16790:

for (j in 1:115){
  k[j] <- ind1[j+1]
beusselstr.dataframe[j] <- cbind(beusselstr.dataframe[j],t(beusselstr.trajectories[ind1[j]:k[j],2])) 
  }

However I receive the error "number of items to replace is not a multiple of replacement length".

ulrich_k
  • 11
  • 4
  • It's not entirely clear what you need. Can you be more explicit about what each row and column should contain in the end? – dave May 15 '14 at 13:10
  • Ah yes sorry, I added an exampe from my dataframe. I hope this clears it up – ulrich_k May 15 '14 at 13:23
  • "I put lon and lat in the same column because I need them for analysis" indicates you don't really understand how to do data analysis. Keep them in different columns and use indexing to retrieve them. Please post a **small, reproducible** sample of your original dataset and of how you want it re-ordered. – Carl Witthoft May 15 '14 at 14:09
  • Haha you might have a point there, I am sorry, this is for my bachelor thesis and my first really big project. https://www.dropbox.com/s/tp2d9lr7xawuvr4/Beusselstr_001_020710.txt this is the original file. – ulrich_k May 15 '14 at 14:16
  • Next time, please provide a minimal reproducing example with fake data (something like I did in my response)... :) – Jealie May 15 '14 at 14:40

1 Answers1

0

First, let's generate some toy data:

df = as.data.frame(matrix(c(seq(2070700,2070700-72*2+1,-1),seq(2071100,2071100-72*2+1,-1),runif(72*4)),ncol=2))
colnames(df) = c('date','lon.lat')
df$date[seq(2,nrow(df),2)] = df$date[seq(1,nrow(df)-1,2)]

That's a matrix representing two sequences of coordinates, kind of similar to your example except that the date format is a bit different. The important point being that each date is repeated twice.

Next, the method I suggest is relying on having your data sorted. In case your data is messy, you should re-order it before going forward:

df = df[order(df$date),]

The trick is to do reshaping in an easy way is to add new columns that labels recordings from the same experiment (rec.nb) and the relative time (rec.time). As your data is now sorted, all you need to do is:

df$rec.nb = rep(seq(1:2),each=72*2)
df$rec.time = rep(seq(1:72),2)

(if you had 3 trajectories, you would put: df$rec.nb = rep(seq(1:3),each=72*3) and so on)

Your data frame should now look like:

     date    lon.lat rec.nb rec.time
1 2070700 0.47047887      1        1
2 2070700 0.26357648      1        2
3 2070698 0.10793420      1        3
4 2070698 0.09126992      1        4
5 2070696 0.75242114      1        5
6 2070696 0.85941990      1        6
[...]
142 2070560 0.5561255161      1       70
143 2070558 0.7901997303      1       71
144 2070558 0.6179680785      1       72
145 2071100 0.0926457571      2        1
146 2071100 0.7780607140      2        2
147 2071098 0.7008311108      2        3

Finally, you can reshape your data:

reshape(df,v.names='lon.lat',timevar='rec.time',idvar='rec.nb',direction='wide')

outputting something along the lines of:

       date rec.nb  lon.lat.1 lon.lat.2 lon.lat.3  lon.lat.4 lon.lat.5   [...]
1   2070700      1 0.47047887 0.2635765 0.1079342 0.09126992 0.7524211   [...]
145 2071100      2 0.09264576 0.7780607 0.7008311 0.48613669 0.4928686   [...]
Jealie
  • 6,157
  • 2
  • 33
  • 36
  • Thank you for your reply! I am not sure what you mean by "your data is messy"? – ulrich_k May 15 '14 at 14:50
  • For example, if your recordings are intermingled... such as having the first dates in that order: `2070700`, `2071100`, `2070623`,`2071211`, etc.. – Jealie May 15 '14 at 15:23
  • Ah okay, thank you. Though I am afraid, if I do that, it will throw all the coordinates of the trajectories out of order. I am not sure, if the way I read the files into R was a good idea. I used rbind to make one huge dataframe for each of the 11 stations. – ulrich_k May 15 '14 at 15:26
  • Well, asking the right question for the first time is not an easy task :) You should come up with something reproducible, and accept the best response so that other people looking for a similar question will be helped. As far as your example goes, my solution *will* work, and I demonstrated that using a generated dataset. You should accept this answer if you agree. If it's not enough for you, you should post a new question after putting more thoughts into providing a minimal working example (http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Jealie May 15 '14 at 15:34
  • I hope it didn't seem as if I was doubting your solution, it works flawlessly! I'm afraid supplying a working example that simulates my data exceeds my skills as well. I uploaded a .txt file here https://www.dropbox.com/s/peu9pa671b6g7su/beusselstr.trajectories.txt in case you are interested. I also added this file to my original post. The problem is, that the trajectories overlap, so sorting the dataframe would mix up the trajectories. Again, thank you very much for your time. – ulrich_k May 15 '14 at 16:01