0

In my workplace, R is integrated with Dremio and I can access the tables in Dremio using RODBC::sqlQuery() function. Tables are having around hundred thousand entries. I am creating data-frame of SQL tables using this function but some rows are storing data in different format. I have read that sqlQuery function converts the binary SQL types to list of raw vectors, hence I am getting cell value like this:

as.raw(c(0x56, 0xfd, 0x36, 0xaf, 0x63, 0x18, 0x77, 0x2a, 0x09, 0x35, 0x9f, 0xe3))
dput(df)
structure(list(`_Id` = structure(list(as.raw(c(0x56, 0xfd, 0x36, 
0xaf, 0x63, 0x18, 0x77, 0x2a, 0x09, 0x35, 0x9f, 0xe3)), as.raw(c(0x56, 
0xfd, 0x37, 0x07, 0x63, 0x18, 0x77, 0x2a, 0x09, 0x35, 0x9f, 0xeb
)), as.raw(c(0x56, 0xfd, 0x38, 0x7a, 0x63, 0x18, 0x77, 0x2a, 
0x09, 0x35, 0xa0, 0x07)), as.raw(c(0x56, 0xfd, 0x38, 0x8b, 0x63, 
0x18, 0x77, 0x2a, 0x09, 0x35, 0xa0, 0x0e)), as.raw(c(0x56, 0xfd, 
0x38, 0x9d, 0x63, 0x18, 0x77, 0x2a, 0x09, 0x35, 0xa0, 0x12))), class = "ODBC_binary")), class = "data.frame", row.names = c(NA, 
5L))

I want cell value to be a single string id as "56fd36af6318772a09359fe3" so that I can use it as matching criteria with other tables. Is there any way to do it?

hemant A
  • 185
  • 14

1 Answers1

0

You can use dplyr


library(dplyr)

df$`_Id` %>% 
  sapply(., function(x) { paste0(x, collapse = "") }) %>% 
  as.data.frame(string_id=as.character(.), stringsAsFactors=FALSE) 

# 1 56fd36af6318772a09359fe3
# 2 56fd37076318772a09359feb
# 3 56fd387a6318772a0935a007
# 4 56fd388b6318772a0935a00e
# 5 56fd389d6318772a0935a012

stevec
  • 41,291
  • 27
  • 223
  • 311
  • 1
    Thanks a lot. It worked but can you please explain what is happening here? – hemant A Aug 07 '19 at 11:35
  • Sure. It's a lot simpler than it looks. If we look at the first element of the only column in the `df` object , we see `c(0x56, 0xfd, 0x36, 0xaf, 0x63, 0x18, 0x77, 0x2a, 0x09, 0x35, 0x9f, 0xe3)`. The `0x` portion of those elements tells R it's [octal](https://stackoverflow.com/questions/2670639/why-are-hexadecimal-numbers-prefixed-with-0x) and this disappears when R reads the data. All we do is use `paste0()` which pastes data together, with the `collapse` parameter set to nothing (in other words no gap between each element). `sapply` lets us apply this technique to each row of `df` – stevec Aug 07 '19 at 11:45
  • To see the effect on just one row, try `c(0x56, 0xfd, 0x36, 0xaf, 0x63, 0x18, 0x77, 0x2a, 0x09, 0x35, 0x9f, 0xe3) %>% paste0(., collapse="")`. The actual answer simply does the same thing to each row – stevec Aug 07 '19 at 11:48
  • Also if I have multiple columns with same raw vectors, I have to run this command for all columns separately or this command can work on multiple columns at once also? – hemant A Aug 07 '19 at 11:49
  • I tried ```c(0x56, 0xfd, 0x36, 0xaf, 0x63, 0x18, 0x77, 0x2a, 0x09, 0x35, 0x9f, 0xe3) %>% paste0(., collapse="")``` but it is actually converting every octal to actual number and then concatenating it, output is ```"8625354175992411942953159227"```. – hemant A Aug 07 '19 at 11:52
  • Wait, that's strange. I will check. It works for me (i.e. returns `56fd36af6318772a09359fe3`) – stevec Aug 07 '19 at 11:54
  • I wanted same thing but I actually have 3 columns like this containing raw vectors as data. So I am just asking that I convert all three columns one by one or all three can be converted simultaneously ? – hemant A Aug 07 '19 at 11:55
  • Normally, it wouldn't be too hard, using `lapply` to apply the same technique across multiple columns. There's something kinda strange about the structure of your data. In that `df` is a `data.frame`, `df$'_id'` is of class `ODBC_binary` (which, as far as I can tell, behaves like a vector), and each element of the ODBC_binary is actually a `list`! It's a (very complicated data structure). I would try wrangle the data structure into something more standard before working with it. Or you could just apply the same technique to each column if you don't have too many – stevec Aug 07 '19 at 12:06
  • Anyways whole command is working. So '.' in sapply function is to iterate row wise? – hemant A Aug 07 '19 at 12:07
  • the `.` in sapply is simply piping the previous argument into sapply. In this case, it's the same as `sapply(df$`_Id`, function(x) { paste0(x, collapse = "") })`. All `sapply` does is apply the function to each item in the list it receives. In this case, it applies the simple function to each row – stevec Aug 07 '19 at 12:09
  • I know its very complicated structure. Basically, data is stored in MongoDB where 3 ids are generated automatically for each record. To get the data in structured format Dremio is being used at my workplace and in Dremio IDs are showing in binary format. Now I have integrated R to Dremio and when I fetch table from Dremio the IDs get converted to raw vectors. – hemant A Aug 07 '19 at 12:09