R: how to access members of array loaded in dataframe elements

Question

From a csv file I loaded date into an R dataframe that looks like this:

> head(mydata)
  row lengthArray                         sports num_runs percent_runs
1   0           4               [24, 18, 24, 18]        0            0
2   1          10 [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]        0            0
3   2           4                   [0, 0, 0, 0]        0            0
4   3           2                         [0, 0]        0            0
5   4           2                       [18, 18]        0            0
6   5           1                            [0]        0            0

I can access and get the types for the integer data frames no problem, but I can't figure out how to access sports:

> class(mydata[4,3])
[1] "factor" 
>  string_factor = mydata[1,3]
> string_factor
[1] [24, 18, 24, 18]
6378 Levels: [0] [0, 0] [0, 0, 0] [0, 0, 0, 0] ... [9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
> class(string_factor)
[1] "factor"
> string_factor_numeric = as.numeric(string_factor)
> string_factor_numeric
[1] 5181

I guess the best R response would be "don't do this", but this is how the data is coming, so I am wondering how I can get those numbers out of the array so that I can use them.

I should also mention that this Convert data.frame columns from factors to characters gave no error message but had no effect, as the array column continued to be classed as factors.

UPDATE: from the comments, you can see this can get you somewhere:
mydata[,3]  <- as.character(mydata[,3])

However this still does not get you to an array with individually accessible elements.

Convert the sports column into character.`mydata[4,3]<-as.character(mydata[4,3])` — user227710, Jun 15 '15 at 22:32
@user227710 thanks for the suggestion, but that had no effect > mydata[[1,3]<-as.character(mydata[1,3]) Error: unexpected assignment in "mydata[[1,3]<-" >mydata[1,3]<-as.character(mydata[1,3]) > class(mydata[1,3]) [1] "factor" — sunny, Jun 15 '15 at 22:37
@user227710 that was a typo while transcribing, I think. Here's pasted directly:> mydata[1,3] <- as.character(mydata[1,3]) > class(mydata[1,3]) [1] "factor" — sunny, Jun 15 '15 at 22:42
You can't make just the first row a character, you have to make the whole column character: `mydata[, 3] <- as.character(mydata[, 3])`. — Gregor Thomas, Jun 15 '15 at 22:43
Also "did not work" isn't informative. Did it give an error message? A warning message? You would also do well to give your desired outcome. Do you want to turn the numbers in `sports` into columns? Do you want to reshape the wide sports column to long? — Gregor Thomas, Jun 15 '15 at 22:45
@Gregor you are correct, I tried to make my question more descriptive. My goal is to create a column for each distinct integer and then for each row that column's value will be the number of times that particular integer appeared in the array. So yes, I do want to reshape the wide sports column to long if I understand that correctly. — sunny, Jun 15 '15 at 22:47
@sunny: You should follow the advice of Gregor. It must work. — user227710, Jun 15 '15 at 22:48
@user227710 this in some ways leads me to another puzzle because this character type because now I have > f = mydata[1,3] > f [1] "[{u'sport': 24}, {u'sport': 18}, {u'sport': 24}, {u'sport': 18}]" > 24 %in% f [1] FALSE > 18 %in% f [1] FALSE > class(f) [1] "character" — sunny, Jun 15 '15 at 22:50
@user227710 I am happy to delete if it's not helpful, but I still cannot access the individual members of the array. — sunny, Jun 15 '15 at 22:50
@Gregor if it were a string, I should be able to access individual characters? I don't think it's just a string because the output looks funny. The as.character converted this:[24,18,24,18] to this: {u'sport': 24}, {u'sport': 18}, {u'sport': 24}, {u'sport': 18} — sunny, Jun 15 '15 at 22:57

Steven Beaupré · Answer 1 · 2015-06-15T23:33:05.333

Here's another idea using splitstackshape:

library(splitstackshape)
library(dplyr)
mydata %>% 
  mutate(sports = gsub("\\[|\\]", "", sports)) %>%
  cSplit("sports", sep = ",", direction = "wide")

Which gives:

   row lengthArray num_runs percent_runs sports_01 sports_02 sports_03 sports_04 sports_05 sports_06 sports_07 sports_08 sports_09 sports_10
1:   0           4        0            0        24        18        24        18        NA        NA        NA        NA        NA        NA
2:   1          10        0            0         2         2         2         2         2         2         2         2         2         2
3:   2           4        0            0         0         0         0         0        NA        NA        NA        NA        NA        NA
4:   3           2        0            0         0         0        NA        NA        NA        NA        NA        NA        NA        NA
5:   4           2        0            0        18        18        NA        NA        NA        NA        NA        NA        NA        NA
6:   5           1        0            0         0        NA        NA        NA        NA        NA        NA        NA        NA        NA

Or as per @thelatemail comment, you could also store a list as a column:

library(stringi)
df <- mydata %>%
  mutate(sports = as.list(stri_extract_all(sports, regex = "[:digit:]")))

Which will give you the following data structure:

> str(df)
#'data.frame':  6 obs. of  5 variables:
# $ row         : int  0 1 2 3 4 5
# $ lengthArray : int  4 10 4 2 2 1
# $ sports      :List of 6
#  ..$ : chr  "2" "4" "1" "8" ...
#  ..$ : chr  "2" "2" "2" "2" ...
#  ..$ : chr  "0" "0" "0" "0"
#  ..$ : chr  "0" "0"
#  ..$ : chr  "1" "8" "1" "8"
#  ..$ : chr "0"
# $ num_runs    : int  0 0 0 0 0 0
# $ percent_runs: int  0 0 0 0 0 0

You can then access the elements of the list like this:

> df$sports[[1]][1] #first element of first list
#[1] "2"

score 1 · Accepted Answer · answered Jun 15 '15 at 23:02

Here's your data with dput:

mydata = structure(list(row = 0:5, lengthArray = c(4L, 10L, 4L, 2L, 2L, 
1L), sports = structure(c(6L, 5L, 1L, 2L, 4L, 3L), .Label = c("[0, 0, 0, 0]", 
"[0, 0]", "[0]", "[18, 18]", "[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]", 
"[24, 18, 24, 18]"), class = "factor"), num_runs = c(0L, 0L, 
0L, 0L, 0L, 0L), percent_runs = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("row", 
"lengthArray", "sports", "num_runs", "percent_runs"), class = "data.frame", row.names = c(NA, 
-6L))

First we convert the sports column to a character

mydata$sports = as.character(mydata$sports)

Now I'll get rid of the brackets and spaces (leaving the commas)

library(stringr)
mydata$sports = str_replace_all(mydata$sports, pattern = "\\[|\\]| ", "")

And lastly separate the sports column into multiple columns

library(tidyr)
mydata = separate(mydata, sports, into = paste0("sport", 1:max(mydata$lengthArray)), sep = ",", extra = "drop")

mydata
#  row lengthArray sport1 sport2 sport3 sport4 sport5 sport6 sport7 sport8 sport9 sport10 num_runs percent_runs
#1   0           4     24     18     24     18   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>        0            0
#2   1          10      2      2      2      2      2      2      2      2      2       2        0            0
#3   2           4      0      0      0      0   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>        0            0
#4   3           2      0      0   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>        0            0
#5   4           2     18     18   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>        0            0
#6   5           1      0   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>        0            0

It's also fine to store a list as a column in a data.frame - e.g. - `mydata$sports <- strsplit(gsub("^\\[|\\]$","",as.character(mydata$sports)),", |\\[|\\]")` - which you can then access subcomponents of. — thelatemail, Jun 15 '15 at 23:10

score 0 · Answer 3 · answered Jun 15 '15 at 22:59

Recreating your data:

text = "
row lengthArray                            sports num_runs percent_runs
   0           4               '[24, 18, 24, 18]'        0            0
   1          10 '[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]'        0            0
   2           4                   '[0, 0, 0, 0]'        0            0
   3           2                         '[0, 0]'        0            0
   4           2                       '[18, 18]'        0            0
   5           1                            '[0]'        0            0"

data <- read.table(text = text, header= TRUE)

You probably shoud take the values in sports and create new columns... but, if want to create the vectors inside the sports column, you can actually do that:

data$sports <- as.character(data$sports)
data$sports <- lapply(data$sports, function(x) eval(parse(text = paste0("c(", gsub("\\[|\\]", "", x),")"))))

Now, for example, if you want to get the third value of the first line of sports:

data$sports[[1]][[3]]
[1] 24

R: how to access members of array loaded in dataframe elements

3 Answers3