0

So, I lately started working with R for a research I'm interested in, and I'm trying to create a multi dimensional array that would contain dataframe rows.

I have a large data frame containing many columns, that are either numeric, or strings. For the sake of simplicity, let's work with 3 columns: thread_id: an integer number between 1 and 10100. user_id: an integer number given to users. post_name: a string that gives us the title of the post

I would like to create a datastructure, that's preferably a two dimensional array, where at the first dimension we have the thread_id, and at the second we have a row from the dataframe.

So, as a return to for

DataSet[1][1], I'd get thread_id: 1, user_id: 100, post_name: "some name 1"
DataSet[1][2], I'd get thread_id: 1, user_id: 101, post_name: "some name 2"
DataSet[5][10], I'd get thread_id: 5, user_id: 900, post_name: "some name 3"

Is this possible to do in R? I only have previous experiences with Java, and in that it is possible to solve with an array for Objects.

Thanks for all the help!

M. Küsz
  • 5
  • 1
  • Welcome to SO. First of all you should read [here](http://stackoverflow.com/help/how-to-ask) about how to ask a good question; a good question has better changes to be solved and you to receive help. On the other hand a read of [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) is also good. It explains how to create a reproducible example in R. Help users to help you by providing a piece of your data a desired output and things you have tried so far. – SabDeM May 14 '16 at 13:33

2 Answers2

0

If, say, thread_id took on values 1 to 5, you could use:

mylist <- list()
for(i in 1:5)
    mylist[[i]] <- myData[thread_id==i,]

You could of course use max(myData$thread_id) instead of 5...

Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
0

Here is an alternative for you.

Assumption: df is a data.frame

convert.to.str <- function(df){
    df_col <- names(df)
    val <- unlist(df)

    ans <- paste(df_col,val,sep=': ')

    final_ans <- paste(ans,collapse=', ')
}

int_ans <- data.frame(thread_id = df$thread_id, ans = apply(df,1,convert.to.str), nrow2=1:nrow(df))

library(reshape2)

int_ans2 <- dcast(int_ans,thread_id ~ nrow2,value.var='ans')

DataSet <- int_ans2[2:ncol(int_ans2)]

dimnames(DataSet)[[1]] <- int_ans2$thread_id
Kunal Puri
  • 3,419
  • 1
  • 10
  • 22
  • Hey, thanks for sharing! Seems to have worked perfectly for int_ans, akthough it only populated number 13 at int_ans2 (99 times :D ). Any idea why this might happen? All I did was change the variables to the names I have them in my program. – M. Küsz May 14 '16 at 18:47
  • @M.Küsz Can you please share the code modified by you? – Kunal Puri May 15 '16 at 03:26
  • @M.Küsz The code seems to be absolutely all right. Actually, I am not able to recognise the error about which you are talking about. I would be grateful if you could include the glimpse of dataset along with the error that you are getting. – Kunal Puri May 15 '16 at 12:32
  • @M.Küsz Can you please explain the meaning of this line `it only populated number 13 at int_ans2 (99 times :D)`? Is number 13 coming 99 times in rownames of int_ans2? – Kunal Puri May 16 '16 at 05:13