0

I'm new to R and have encountered a problem. I'm trying to fill a matrix ("m") based on the data in another dataframe ("razdeljena1") - all of the column and row names or "m" are matching the names found in first and second column in "razdeljena1".

m <- matrix(1:729, byrow = TRUE, nrow = 27,
            dimnames = list(c("PES", "MAčKA", "VTÁK","HORA","STROM","RIEKA","SLNKO","MÄSO","SYR","VODA","CHLIEB","KLADIVO","METLA","PERO","NÔŽ","POSTEĽ","STÔL","SKRIŇA","LAMPA","TOPÁNKA","NOHAVICE","KLOBÚK","DÁŽDNIK","VEDRO","FĽAŠA","VRECE","KONZERVA"),
                            c("PES", "MAčKA", "VTÁK","HORA","STROM","RIEKA","SLNKO","MÄSO","SYR","VODA","CHLIEB","KLADIVO","METLA","PERO","NÔŽ","POSTEĽ","STÔL","SKRIŇA","LAMPA","TOPÁNKA","NOHAVICE","KLOBÚK","DÁŽDNIK","VEDRO","FĽAŠA","VRECE","KONZERVA")))
m <- replace(m, 1:729, NA)

Here are the first 12 observations in razdeljena1

       1          2          rating.response
  [1,] "SYR"      "KLADIVO"  "1"            
  [2,] "LAMPA"    "DÁŽDNIK"  "1"            
  [3,] "CHLIEB"   "KLOBÚK"   "1"            
  [4,] "STROM"    "KONZERVA" "1"            
  [5,] "PERO"     "NÔŽ"      "1"            
  [6,] "STÔL"     "DÁŽDNIK"  "1"            
  [7,] "STROM"    "VODA"     "1"            
  [8,] "DÁŽDNIK"  "KONZERVA" "1"            
  [9,] "PERO"     "POSTEĽ"   "1"            
 [10,] "HORA"     "VODA"     "1"            
 [11,] "LAMPA"    "FĽAŠA"    "1"            
 [12,] "STROM"    "SKRIŇA"   "1"     

For this I created a while loop that would read every line and extract necessary info and write it to the matrix.

a <- 1
while (a <379){
  beseda1 <- razdeljena1[a,1]
  beseda2 <- razdeljena1[a,2]
  relat <- razdeljena1[a,3]

  m[beseda1, beseda2] <- relat
  m[beseda2, beseda1] <- relat

  a <- a+1
}

The loop works well for the first 9 iterations (and writes into a matrix correctly) and then returns an error Error in [<-(*tmp*, beseda1, beseda2, value = relat) : subscript out of bounds. I have looked into the error and the answer to it says that I'm trying to access a column or a row that does not exist - however: when I try to access the cell outside of the loop (with the identically defined coordinates) it in fact returns the correct cell.

example: The error occurs when beseda1 = "PERO" and beseda2 = "POSTEĽ"; however when I try to change it outside of the loop it works just fine:

beseda1 <- "PERO"
beseda2 <- "POSTEĽ"

m[beseda1, beseda2] <- 1
m[beseda2, beseda1] <- 1

I have also tried to see if this is the only pair that would cause problems (by starting a while loop with a number greater than 9) and got the the same error after some iterations.

  • I can't get your example to run, the line `for (i in p %>% select(Pair))` gives an error `object 'p' not found`. – Gregor Thomas Feb 01 '20 at 19:14
  • Also note that your loop is looping over numbers, not strings, so if 9 iterations work, presumably it breaks when `a` is `10`. Does `razdeljena1[10,1]` work? What about `razdeljena1[10,2]` and `razdeljena1[10,3]`? What is `dim(razdeljena)`? – Gregor Thomas Feb 01 '20 at 19:17
  • object p is a table with data that cannot be shared. I take only one column ("Pair") which has format e.g. "PERO - POSTEĽ "- and split it so I get "PERO" "POSTEL" in separated columns. Same thing goes for rel_mat only there I do not need to change the format and I just add it to the dataframe "razdeljena" and get "razdeljena1" – Oskar Dragan Feb 01 '20 at 19:24
  • all of the examples you suggested work with the correct response. dim(razdeljena) = 378 3 – Oskar Dragan Feb 01 '20 at 19:26
  • Also I my add that the error is reported for the line in which I try to change the matrix: Error in `[<-`(`*tmp*`, beseda1, beseda2, value = relat) : subscript out of bounds – Oskar Dragan Feb 01 '20 at 19:29
  • Well, could you add enough of `p` so that we can run some code? Put `dput(p[1:12, "Pair", drop = FALSE])` in the question. – Gregor Thomas Feb 01 '20 at 19:32
  • But, are you just reshaping data from long to wide, in a loop? If you're interested in easier solutions, post a a few rows of `razdeljena1`, all 3 columns, and your whole loop is a one-liner in `tidyr` or `reshape2`. Maybe have a look at the FAQ on [reshaping data from long to wide](https://stackoverflow.com/q/5890584/903061) – Gregor Thomas Feb 01 '20 at 19:34

1 Answers1

0

R supports assignment-indexing by two column character matrices containing dim-names; No loop is needed. Simply do this:

 m[ razdeljena1[ , 1:2] ] <- razdeljena1[ , 3]

Demonstration that it succeeds for the small example data you provided:

> m [ razdeljena1[ , 1:2] ] <- razdeljena1[,3]
> m
         PES MAčKA VTÁK HORA STROM RIEKA SLNKO MÄSO SYR VODA CHLIEB KLADIVO METLA PERO NÔŽ
PES      NA  NA    NA   NA   NA    NA    NA    NA   NA  NA   NA     NA      NA    NA   NA 
MAčKA    NA  NA    NA   NA   NA    NA    NA    NA   NA  NA   NA     NA      NA    NA   NA 
VTÁK     NA  NA    NA   NA   NA    NA    NA    NA   NA  NA   NA     NA      NA    NA   NA 
HORA     NA  NA    NA   NA   NA    NA    NA    NA   NA  "1"  NA     NA      NA    NA   NA 
STROM    NA  NA    NA   NA   NA    NA    NA    NA   NA  "1"  NA     NA      NA    NA   NA 
#snipped remaining rows.
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • This yields an error: Error in `[<-`(`*tmp*`, razdeljena1[, 1:2], value = c("1", "1", "1", "1", "1", : subscript out of bounds – Oskar Dragan Feb 01 '20 at 20:58
  • You have a value in `razdeljena1[ , 1:2] ]` that is not present in `dimnames(m)[[1]]`. You need to sanitize your data. Run `unique(razdeljena1[ , 1:2] )[which( !unique(razdeljena1[ , 1:2] ]) %in% dimnames(m)[[1]] ) ]` to find the misspelled dimname. – IRTFM Feb 01 '20 at 21:00
  • The problem really was in the dimnames(m). It was due to different encoding, but i manage to fix it. Thank you a lot! – Oskar Dragan Feb 02 '20 at 20:11