0

I am trying to merge two matrices using rbind but it gives me some warning which I can't understand. A piece of code will explain. First I post the two matrice I am merging:

> first.m
   label valueA valueB valueC measureA measureB measureC measureD
2      2    158   1020     10     94.0       20        8        0
4      4    101   1016     10     11.0        5        7        0
9      9    439   1003     10     12.0        7        7        0
11    11    434    985     10     25.5        6       12        0
12    12    839    984     10     39.5       18        8        0
14    14    339    979     10     43.5       13       13        0
23    23    127    926     10     16.5        6       10        0
26    25    748    916     10     57.0       13       14        0
34    33    352    904     10     43.5       15       20        0
35    34    254    904     10    239.5       29       14        0
> second.m
   label valueA valueB valueC measureA measureB measureC measureD
1      5    832   1019     20     15.0        9        6        0
2      7    158   1020     20    102.5       24        8        0
3      8    139   1020     20     60.0       14        7        0
4     17    321   1018     20     77.0       14       10        0
5     21    815   1014     20    132.0       17       17        0
6     25    719   1009     20    158.0       21       14        0
7     28    496   1002     20      7.0        5        5        0
8     39    493    992     20     36.0        7       13        0
9     45    840    984     20     47.0       19        9        0
10    53    339    978     20     53.5       12       11        0

Then some info about the nature of the data which seem to be implicated in the issue I'm encountering:

> typeof(first.m$label); typeof(second.m$label)
[1] "integer"
[1] "integer"
> typeof(first.m$label[1]); typeof(second.m$label[1])
[1] "integer"
[1] "integer"

and, finally, the actual problem:

> this.work <- rbind(first.m, second.m)
> this.doesnt <- rbind(second.m, first.m)
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = c(2L, 4L, 9L, 11L, 12L, 14L,  :
  invalid factor level, NA generated

Why is that rbind works in a way and not in the other?

EDIT: I forgot to mention that this question seems similar rbind() function in R produces NA's in the merged dataframe but does not solve the problem.

EDIT 2:

as mentioned in the comment by @Rob, here the result of str() applied to my two matrices.

> str(first.m)
'data.frame':   10 obs. of  8 variables:
 $ nodeName: int  2 3 7 8 9 10 12 17 20 21
 $ x       : int  158 139 496 493 840 339 296 292 129 1008
 $ y       : int  1020 1020 1002 992 984 978 973 937 925 919
 $ z       : int  20 20 20 20 20 20 20 20 20 20
 $ area    : num  102 60 7 36 47 ...
 $ width   : int  24 14 5 7 19 12 7 14 10 7
 $ height  : int  8 7 5 13 9 11 13 7 15 6
 $ zetaMean: num  0 13 0 7 0 0 0 0 0 0
> str(second.m)
'data.frame':   10 obs. of  8 variables:
 $ nodeName: Factor w/ 275 levels "1003","1018",..: 152 210 235 70 80 87 94 125 139 160
 $ x       : int  832 158 139 321 815 719 496 493 840 339
 $ y       : int  1019 1020 1020 1018 1014 1009 1002 992 984 978
 $ z       : int  20 20 20 20 20 20 20 20 20 20
 $ area    : num  15 102 60 77 132 ...
 $ width   : int  9 24 14 14 17 21 5 7 19 12
 $ height  : int  6 8 7 10 17 14 5 13 9 11
 $ zetaMean: num  0 0 0 0 0 0 0 0 0 0
gabt
  • 668
  • 1
  • 6
  • 20
  • 1
    How about making your problem reproducible using dput()? – s_baldur Oct 16 '18 at 08:10
  • because I didn't know about it! I'll look into it – gabt Oct 16 '18 at 08:11
  • 1
    Probably because one of your variables is a factor. When merging the two, some levels of that factor in one dataframe fall outside of the scope of that same factor in the other dataframe. – Robert Oct 16 '18 at 08:15
  • this I can't understand. Assuming this is the reason then `stringsAsFactors = F` will solve the issue? Can I be sure? – gabt Oct 16 '18 at 08:17
  • Use class instead of typeof. It gives usually more relevant information. – Roland Oct 16 '18 at 08:19
  • I suspect your "label" variable is a factor in one matrix. By merging the two dataframes you introduce levels of that factor (2, 4, 9, 11) that the other matrix doesnt contain. it works the other way around because it is probably a numerical variable in your other matrix – Robert Oct 16 '18 at 08:21
  • I tried importing the data using the flag `stringsAsFactors = F` and it seems working both ways. But what I know is that these two matrices contains different labels each. Are you suggesting that rbind can't find correspondences between the factors, hence it gives a warning? Or that while importing, one of those labels is read as a Factor and this, somehow, disrupt the whole rbind-ing? – gabt Oct 16 '18 at 08:26
  • That is indeed my guess, as I do not know what your data looks like under the hood (i.e. by using str() or broom::glimpse()) – Robert Oct 16 '18 at 08:42
  • that's ok! I believe its kind of an issue in the importing phase. unfortunately I don't know what do you mean by _(i.e. by using str() or broom::glimpse())_ otherwise I would have attached my data tables. – gabt Oct 16 '18 at 08:53
  • import your data, run str(data) on the matrices you use for the merge and return the results. – Robert Oct 16 '18 at 08:58
  • as in my edit, it seems that the _second.m_ matrix contains some factors in the _label_ field. Hence, as you said, this is the issue. Am I correct? – gabt Oct 16 '18 at 10:36

1 Answers1

0

basically the issue was related to the fact that I was importing a data table without specifying whether stringsAsFactors should be True or False.

When I was not giving any info, R imported the data as factors hence the error as mentioned in the comments by @Rob.

gabt
  • 668
  • 1
  • 6
  • 20