-1

I have a problem with combining data frames which share the same rows. I imported rasters from different folders and converted them into data frames and tried to combine them together but the data don't line together and I don't know why!? Does anyone have the answer, please?

Fun <- function(f) {
  df <- stack(f)
}

files <- dir("mypath", recursive=TRUE, full.names=TRUE, pattern=".tif$")
All <- sapply(files, Fun)

for(i in All){
  df1 <- lapply(All, as.data.frame, xy = TRUE)
  all <- dplyr::bind_rows(df1)
}

Now. for addational

  R1
        x        y    R1
1  696060 -3327450    66
2  696090 -3327450    71
3  696120 -3327450    69
4  696150 -3327450    65
5  696180 -3327450    67
6  696210 -3327450    68
7  696240 -3327450    67
8  696270 -3327450    68
9  696300 -3327450    69
10 696330 -3327450     0

R2
        x        y    R2
1  696060 -3327450    66
2  696090 -3327450    71
3  696120 -3327450    69
4  696150 -3327450    65
5  696180 -3327450    67
6  696210 -3327450    68
7  696240 -3327450    67
8  696270 -3327450    68
9  696300 -3327450    69
10 696330 -3327450     0

R1
        x        y   R1
1  753810 -3339930   109
2  753840 -3339930   108
3  753870 -3339930   108
4  753900 -3339930   109
5  753930 -3339930   108
6  753960 -3339930   109
7  753990 -3339930   109
8  754020 -3339930   109
9  754050 -3339930   110
10 754080 -3339930   109

R2
        x        y   R2
1  753810 -3339930   109
2  753840 -3339930   108
3  753870 -3339930   108
4  753900 -3339930   109
5  753930 -3339930   108
6  753960 -3339930   109
7  753990 -3339930   109
8  754020 -3339930   109
9  754050 -3339930   110
10 754080 -3339930   109


The output will be like that:
          x        y  R1    R2
1  696060 -3327450    66    NA
2  696090 -3327450    71    NA
3  696120 -3327450    69    NA
4  696150 -3327450    65    NA
5  696180 -3327450    67    NA
6  696210 -3327450    68    NA
7  696240 -3327450    67    NA
8  696270 -3327450    68    NA
9  696300 -3327450    69    NA
10 696330 -3327450     0    NA
11 696060 -3327450    NA    66
12 696090 -3327450    NA    71
13 696120 -3327450    NA    69
14 696150 -3327450    NA    65
15 696180 -3327450    NA    67
16 696210 -3327450    NA    68
17 696240 -3327450    NA    67
18 696270 -3327450    NA    68
19 696300 -3327450    NA    69
20 696330 -3327450    NA     0
21 753810 -3339930   109   109
22 753840 -3339930   108   108
23 753870 -3339930   108   108
24 753900 -3339930   109   109
25 753930 -3339930   108   108
26 753960 -3339930   109   109
27 753990 -3339930   109   109
28 754020 -3339930   109   109
29 754050 -3339930   110   110
30 754080 -3339930   109   109

Any idea about what is going on or suggestion what should I do?

Jasurbek
  • 2,946
  • 3
  • 20
  • 37
  • 2
    This is the 3rd time you're asking this question, and each time people point you to the `dplyr` `join` functions. Did you try that already? Is the output not what you expect, or what is your current issue? – Sven Jun 20 '19 at 11:36
  • This is because I didn't get what I wanted because when joining data frames should be 20 rows in total, not 30 or 40 rows as I get now. Every two data frames have identical x and y, so the output should be 20 row. I can put all dat frames in one folder and row bind them and I get only 20 row, but I have more folders that have different rasters to be converted to data frames, so I want to run only one function to do so. If I put the folders in one folder I just call the dataframes and use row_bind and there won't be any NAs and the identical x and y will line together. I hope this is clear now. – user236137 Jun 20 '19 at 13:26
  • Just to mention that I tried all dplyr and merging functions, but none of them worked for me. I'm pretty sure there must a little thing that I need to add to the code, but I still not aware of it. – user236137 Jun 20 '19 at 13:42
  • 2
    So show the code you tried, that we we can correct your error! – Gregor Thomas Jun 20 '19 at 14:10

3 Answers3

0

In dplyr you will find the inner_join or left_join functions (using the by = c() parameter to specify the join column).

I'd recommend those as an easy way of linking data, especially if you're familiar with SQL.

Jon
  • 445
  • 3
  • 15
0

You may adapt this solution. The order you can get by adding an id column before merging.

res <- Reduce(function(...) merge(..., all=TRUE), 
       Map(`[<-`, All, "id", value=substring(names(All), 2)))
res[order(res$id), -3]  # order here by `id`
#         x        y  R2  R1
# 1  696060 -3327450  NA  66
# 3  696090 -3327450  NA  71
# 5  696120 -3327450  NA  69
# 7  696150 -3327450  NA  65
# 9  696180 -3327450  NA  67
# 11 696210 -3327450  NA  68
# 13 696240 -3327450  NA  67
# 15 696270 -3327450  NA  68
# 17 696300 -3327450  NA  69
# 19 696330 -3327450  NA   0
# 2  696060 -3327450  66  NA
# 4  696090 -3327450  71  NA
# 6  696120 -3327450  69  NA
# 8  696150 -3327450  65  NA
# 10 696180 -3327450  67  NA
# 12 696210 -3327450  68  NA
# 14 696240 -3327450  67  NA
# 16 696270 -3327450  68  NA
# 18 696300 -3327450  69  NA
# 20 696330 -3327450   0  NA
# 21 753810 -3339930  NA 109
# 23 753840 -3339930  NA 108
# 25 753870 -3339930  NA 108
# 27 753900 -3339930  NA 109
# 29 753930 -3339930  NA 108
# 31 753960 -3339930  NA 109
# 33 753990 -3339930  NA 109
# 35 754020 -3339930  NA 109
# 37 754050 -3339930  NA 110
# 39 754080 -3339930  NA 109
# 22 753810 -3339930 109  NA
# 24 753840 -3339930 108  NA
# 26 753870 -3339930 108  NA
# 28 753900 -3339930 109  NA
# 30 753930 -3339930 108  NA
# 32 753960 -3339930 109  NA
# 34 753990 -3339930 109  NA
# 36 754020 -3339930 109  NA
# 38 754050 -3339930 110  NA
# 40 754080 -3339930 109  NA

Data

All <- list(structure(list(x = c(696060L, 696090L, 696120L, 696150L, 
696180L, 696210L, 696240L, 696270L, 696300L, 696330L), y = c(-3327450L, 
-3327450L, -3327450L, -3327450L, -3327450L, -3327450L, -3327450L, 
-3327450L, -3327450L, -3327450L), R1 = c(66L, 71L, 69L, 65L, 
67L, 68L, 67L, 68L, 69L, 0L)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(x = c(696060L, 696090L, 696120L, 696150L, 
    696180L, 696210L, 696240L, 696270L, 696300L, 696330L), y = c(-3327450L, 
    -3327450L, -3327450L, -3327450L, -3327450L, -3327450L, -3327450L, 
    -3327450L, -3327450L, -3327450L), R2 = c(66L, 71L, 69L, 65L, 
    67L, 68L, 67L, 68L, 69L, 0L)), row.names = c(NA, -10L), class = "data.frame"), 
    structure(list(x = c(753810L, 753840L, 753870L, 753900L, 
    753930L, 753960L, 753990L, 754020L, 754050L, 754080L), y = c(-3339930L, 
    -3339930L, -3339930L, -3339930L, -3339930L, -3339930L, -3339930L, 
    -3339930L, -3339930L, -3339930L), R1 = c(109L, 108L, 108L, 
    109L, 108L, 109L, 109L, 109L, 110L, 109L)), row.names = c(NA, 
    -10L), class = "data.frame"), structure(list(x = c(753810L, 
    753840L, 753870L, 753900L, 753930L, 753960L, 753990L, 754020L, 
    754050L, 754080L), y = c(-3339930L, -3339930L, -3339930L, 
    -3339930L, -3339930L, -3339930L, -3339930L, -3339930L, -3339930L, 
    -3339930L), R2 = c(109L, 108L, 108L, 109L, 108L, 109L, 109L, 
    109L, 110L, 109L)), row.names = c(NA, -10L), class = "data.frame"))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I appreciate your try Jay.sf, but unfortunately not what I was expect. The problem is that I expect only 20 rows and without NAs because every 2 data frames share the same x&y. I can do it easily with dplyr row_bind if I put call all data frames and I join them together, but when join data frames from list of data frames, I get 40 rows and with NAs, and that is not what I want. If you see my data, the last two data frames joint properly in the output data frame, but the first two were not. – user236137 Jun 20 '19 at 13:51
0

I'm not sure if you need more than just these 4 dataframes, but I've put the main idea here for you. You/we can adjust for more than these 4, or when the dataframes are in a list, etc...

Data:

dput(df1)
structure(list(x = c(696060L, 696090L, 696120L, 696150L, 696180L, 
696210L, 696240L, 696270L, 696300L, 696330L), y = c(-3327450L, 
-3327450L, -3327450L, -3327450L, -3327450L, -3327450L, -3327450L, 
-3327450L, -3327450L, -3327450L), R1 = c(66L, 71L, 69L, 65L, 
67L, 68L, 67L, 68L, 69L, 0L)), class = "data.frame", row.names = c(NA, 
-10L))
dput(df2)
structure(list(x = c(696060L, 696090L, 696120L, 696150L, 696180L, 
696210L, 696240L, 696270L, 696300L, 696330L), y = c(-3327450L, 
-3327450L, -3327450L, -3327450L, -3327450L, -3327450L, -3327450L, 
-3327450L, -3327450L, -3327450L), R2 = c(66L, 71L, 69L, 65L, 
67L, 68L, 67L, 68L, 69L, 0L)), class = "data.frame", row.names = c(NA, 
-10L))
dput(df3)
structure(list(x = c(753810L, 753840L, 753870L, 753900L, 753930L, 
753960L, 753990L, 754020L, 754050L, 754080L), y = c(-3339930L, 
-3339930L, -3339930L, -3339930L, -3339930L, -3339930L, -3339930L, 
-3339930L, -3339930L, -3339930L), R1 = c(109L, 108L, 108L, 109L, 
108L, 109L, 109L, 109L, 110L, 109L)), class = "data.frame", row.names = c(NA, 
-10L))
dput(df4)
structure(list(x = c(753810L, 753840L, 753870L, 753900L, 753930L, 
753960L, 753990L, 754020L, 754050L, 754080L), y = c(-3339930L, 
-3339930L, -3339930L, -3339930L, -3339930L, -3339930L, -3339930L, 
-3339930L, -3339930L, -3339930L), R2 = c(109L, 108L, 108L, 109L, 
108L, 109L, 109L, 109L, 110L, 109L)), class = "data.frame", row.names = c(NA, 
-10L))

First combine per 2 dataframes by using dplyrs left_join:

combined1 <- left_join(df1, df2, by = c("x" = "x", "y" = "y"))
combined1
        x        y R1 R2
1  696060 -3327450 66 66
2  696090 -3327450 71 71
3  696120 -3327450 69 69
4  696150 -3327450 65 65
5  696180 -3327450 67 67
6  696210 -3327450 68 68
7  696240 -3327450 67 67
8  696270 -3327450 68 68
9  696300 -3327450 69 69
10 696330 -3327450  0  0
combined2 <- left_join(df3, df4, by = c("x" = "x", "y" = "y"))
combined2
        x        y  R1  R2
1  753810 -3339930 109 109
2  753840 -3339930 108 108
3  753870 -3339930 108 108
4  753900 -3339930 109 109
5  753930 -3339930 108 108
6  753960 -3339930 109 109
7  753990 -3339930 109 109
8  754020 -3339930 109 109
9  754050 -3339930 110 110
10 754080 -3339930 109 109

Then you can rbind these into the full dataframe you want:

allCombined <- rbind(combined1, combined2)
allCombined
        x        y  R1  R2
1  696060 -3327450  66  66
2  696090 -3327450  71  71
3  696120 -3327450  69  69
4  696150 -3327450  65  65
5  696180 -3327450  67  67
6  696210 -3327450  68  68
7  696240 -3327450  67  67
8  696270 -3327450  68  68
9  696300 -3327450  69  69
10 696330 -3327450   0   0
11 753810 -3339930 109 109
12 753840 -3339930 108 108
13 753870 -3339930 108 108
14 753900 -3339930 109 109
15 753930 -3339930 108 108
16 753960 -3339930 109 109
17 753990 -3339930 109 109
18 754020 -3339930 109 109
19 754050 -3339930 110 110
20 754080 -3339930 109 109

Does this make sense?

Sven
  • 1,203
  • 1
  • 5
  • 14