2

I'd like to merge two datasets based on a common column. Dataset A is a geoTIFF image, representing RGB values of an area. Dataset B is a point cloud with xyz values of the same area.

I want to merge the RGB info in the image to the 3d data. I thougth to use the x y coordinates of the two datasets (which are in the same coordinate system). I wrote a script inspired by code snippets found in stackoverflow, but I need to implement my whole code (sources are 1, 2, and 3).

The issue is that the x y coordinates in thwe two files have different precision (decimal numbers). Dataset A has 0 to 2 digits; dataset B has much more. I rounded the dataset B digits to be 2. Now, I'd like to pad with zeros when the digits of datset A are less than 2, so that the final merge will hopefully work.

Would a simple if statement be fine considering that my datset has >280000 rows? Or should I go for indexing? Anyway, I'm fairly new in using R, so I hope the possible posters woud help me with a code example. Below is my code:

require(raster)
require(rgl)

setwd("C:/my/folder")

# Read tiff file
img <- stack("image.tif")

vals <- extract(img, 1:ncell(img))
coord <- xyFromCell(img, 1:ncell(img))
combine <- cbind(coord, vals)
remove(vals)
remove(coord)

# read POINTCLOUD and assign names
lidar <- read.table("lidardata.txt")
names(lidar) <- c("x","y","z")

decimalplaces <- function(x) {
  if ((x %% 1) != 0) {
    nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed=TRUE)[[1]][[2]])
  } else {
    return(0)
  }
}


# HERE I SHOULD PAD THE LIDAR VARIABLE WITH ZEROS IN DECIMAL POSITIONS WHEN THE DIGITS ARE LESS THAN 2!!!
lidar$xy <- do.call(paste0,lidar[,1:2])

combine$x <- round(combine$x, digits = 2)
combine$y <- round(combine$y, digits = 2)
combine$xy <- do.call(paste0,combine[1:2])

finaldata <- merge(combine,lidar,by = 'xy', all = FALSE)

EDIT 1

As suggested by @Heroka, here is also an example of how the lidar (the dataset A) looks like, and how it should be after padding it with zeros.

LIDAR (original)

x     y     z
12    9     87
11    23.4  100

LIDAR (altered, and with 'xy' column added for joining)

x     y     z     xy
12.00 9.00  87    12.009.00
11.00 23.40 100   11.0023.40

EDIT 2

I somehow managed to retrieve the number of digits in all x and y of my 'lidar' variable (dataset B) with counting <- sapply(lidar$x, decimalplaces) In the example above (LIDAR-original), this would give [0 0] for the first (x) column, and [0 1] for the second (y) column. I should be able to find each row in my x y datset with a value of 0 or 1 as digits (not 2) and pad with 0 like in LIDAR-altered above.

Community
  • 1
  • 1
umbe1987
  • 2,894
  • 6
  • 35
  • 63
  • 1
    Would 'sprintf("%02f",....) help? Like `sprintf("%.02f",0.1)`? – Heroka Jan 11 '16 at 14:54
  • I don't know, I think that is a matter of printing, or I did not understand your suggestion. Anyway, I am now looking at this (http://stackoverflow.com/questions/4317452/how-to-count-how-many-elements-satisfy-a-condition-in-an-idiomatic-way), which seems to indicating a way to do that. Thanks – umbe1987 Jan 11 '16 at 14:55
  • 1
    I understood your question as 'I need to have a way to ensure two digits and thus to pad with zeroes', which is what sprintf can do (you can save the output to a variable). Could you maybe extend your question with some parts of lidar and combine, and what your intended result is? – Heroka Jan 11 '16 at 14:57
  • We appear to have an [XY problem](http://meta.stackexchange.com/a/66378/203914) here. – Roland Jan 11 '16 at 15:26

1 Answers1

1

I do not understand why you need to pad with zeros. If the coordinates are of class numeric and both were rounded using round (which should avoid issues of floating point precision) you should be able to just merge by them. Something like this:

lidar$x <- round(lidar$x, 2)
lidar$y <- round(lidar$y, 2)
combine$x <- round(combine$x, digits = 2)
combine$y <- round(combine$y, digits = 2)

finaldata <- merge(combine, lidar, by = c("x", "y") , all = FALSE)
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thanks. No, only one was rounded using round (the image, or dataset A), because it had 5 or more digits, while the other has 0 to 2 digits. Of course, the less digits, the less precise the merge will be (and the more data I will loose because there will be a lot of duplicated coordinates). – umbe1987 Jan 11 '16 at 15:14
  • 1
    Well, then you need a completely different approach (such as interpolation maybe). Pasting together x and y coordinates is definitely not the way to go nor is padding with zero (which doesn't make a difference at all). – Roland Jan 11 '16 at 15:16
  • Why not? As far as I know, if I simply have the same xy (converted as character, and that's why I would need the same number of digits) it should work with a simple join. I tried it without padding and I got 4035 rows in my finaldata. I thought the many rows I lose were due to the join not considering the xy in dataset A with 0 and 1 digits, and datset B with 2 digits. – umbe1987 Jan 11 '16 at 15:20
  • 1
    The basic message of my answer is that (i) you should definitely not use text processing functions for numeric values (we have `round`, `floor`, etc.) and (ii) you can merge by more than one column. – Roland Jan 11 '16 at 15:22
  • Thanks. Indeed, the way I am rtying to achieve what I want is surely not the best one. Anyway, both the sprintf approach and your answer were useful to made me understand I should try another way. I accept your answer. Guess I will go with this (http://gis.stackexchange.com/questions/138611/assigning-rgb-values-from-geotiff-image-to-lidar-data-using-r). – umbe1987 Jan 11 '16 at 16:39