3

I am storing the result of a PostgreSQL query as a data.frame in R. One of the "columns" is an integer[] array type. In R, this is imported as a character string by the RPostgreSQL package.

How do I convert the character string type into a numeric list type column (or as separate numeric columns) in my data.frame?

connect and get data

require(RPostgreSQL)
drv = dbDriver("PostgreSQL")
con = dbConnect(drv, host = ..., post =..., dbname =..., user=..., password=...)
df = dbGetQuery(con, query_string)
dbDisconnect(con)

minimum working example

library(dplyr)
# randomized arrays of 10 numbers
set.seed(10)
df = data.frame(id = c(1:10)) %>%
  mutate(arrcol = lapply(id, function(X) sample(1:99, 10, replace=T)),
         arrcol = gsub("c(","{{",arrcol,fixed=T),
         arrcol = gsub(")","}}",arrcol,fixed=T))

remove brackets

df$arrcol = gsub(fixed=T, "{", "", df$arrcol)
df$arrcol = gsub(fixed=T, "}", "", df$arrcol)

convert to numeric list

# Attempt 1: 
df$arrcol = as.numeric(df$arrcol)
# Error: (list) object cannot be coerced to type 'double'

# Attempt 2:
df$arrcol = lapply(df$arrcol, 
                   function(x) strsplit(x, ",", fixed=T))
# no error, but now the data appears to be stored as a list of character lists: 
# arrcol[1]: list(c("1", "2", "3", "4", "5",...

# Attempt 3:
df$arrcol = lapply(df$arrcol, 
                   function(x) as.numeric(
                     unlist(
                       strsplit(x, ",", fixed=T))
                   )
)
# this one seems to work
Brian D
  • 2,570
  • 1
  • 24
  • 43
  • there is an issue filed with the RPostgreSQL package repository but I know outside linking is discouraged: https://github.com/codeinthehole/rpostgresql/issues/5 – Brian D Dec 05 '16 at 20:51

1 Answers1

1

My own best answer:

df$numcol = gsub(fixed=T, "{", "", df$arrcol)
df$numcol = gsub(fixed=T, "}", "", df$numcol)

df$numcol <- lapply(df$numcol, 
                    function(x) as.numeric(
                      unlist(
                        strsplit(x = x, split = ",", fixed=T)
                      )
                    )
)

[updated to perform all steps in one go]

df$numcol <- lapply(df$arrcol, 
                    function(x) as.numeric(
                      unlist(
                        strsplit(
                          x = gsub("[{]|[}]", "", x), 
                          split = ",", fixed=T))))

or, equivalently:

df$numcol <- lapply(df$arrcol, 
                    function(x) as.numeric(
                      strsplit(
                          x = gsub("[{]|[}]", "", x), 
                          split = ",", fixed=T)[[1]]
                    )
)

Alternatively, (as long as each of the arrays has the same length) you could use this trick (Splitting a dataframe string column into multiple different columns) to parse the string into separate columns. Note that read.table is clever enough to recognize each of the new variables as integers.

newdf = read.table(text = df$arrcol, header = F, sep = ",")

Additionally, you can easily append those as their own columns onto the original data.frame:

df = cbind(df, newdf)

or, knowing how many new columns will be produced:

df[,3:101] <- read.table(text = gsub("[{]|[}]", "", df$arrcol), 
                         header = F, sep = ",")
Brian D
  • 2,570
  • 1
  • 24
  • 43