4

I run SolusOS a Linux distro(4.0, R 3.6.1) and Windows(Windows 10, R 3.5.2).

My code:

library(datasets)
fit2 <- lm(Sepal.Length~Sepal.Width+Species, data=iris)
summary(fit2)

on Windows:

                   Estimate Std. Error   t value     Pr(>|t|)
(Intercept)       2.2513932  0.3697543  6.088890 9.568102e-09
Sepal.Width       0.8035609  0.1063390  7.556598 4.187340e-12
Speciesversicolor 1.4587431  0.1121079 13.011954 3.478232e-26
Speciesvirginica  1.9468166  0.1000150 19.465255 2.094475e-42

and on SolusOS Linux

                    Estimate Std. Error    t value     Pr(>|t|)
(Intercept)       -1.1562296  2.5541337 -0.4526895 6.514443e-01
Sepal.Width       -0.3158123  0.5572782 -0.5667049 5.717849e-01
Speciesversicolor 11.5719475  1.7693108  6.5403701 9.670731e-10
Speciesvirginica  11.6048354  1.7750914  6.5375987 9.810282e-10

AFAIK the results on Windows are correct. Checked the data, it's identical; checked the documentation if changes in defaults in lm()-function, none found. .Machine (as mentioned somewhere) has one difference: $sizeof.long = 8(Linux) vs. 4(Windows) - I don't think that should matter. Googled for an hour but couldn't find anything that would be related to this.

Any ideas?

edit: I'm using Rstudio on both, Linux version is 99.9.9(odd; though software center gives 1.2.1335; Windows 1.2.5001) so I ran the code in R-terminal and still same results.

deepseefan
  • 3,701
  • 3
  • 18
  • 31
Voltti
  • 96
  • 6
  • 1
    Can you verify they both return the same result for `summary(iris)`? – MrFlick Nov 26 '19 at 20:54
  • 4
    ... and make sure there is no mix up in `fit` and `fit2` – user20650 Nov 26 '19 at 20:55
  • Summary(iris) is identical. I tested with the Windows iris-data that I transferred to Linux as well as the R script. Some other data(datasets::USArrests) gave wrong results as well on Linux. – Voltti Nov 26 '19 at 21:04
  • are there any other object getting loaded when you start R on the linux machine; can you run`ls()` upon R restart please – user20650 Nov 26 '19 at 21:06
  • I cleaned all the variables, but that didn't change results. – Voltti Nov 26 '19 at 21:13
  • "fit" is a typo on my description, not in the actual code. Sorry about that. So no effect. – Voltti Nov 26 '19 at 21:15
  • @user20650 I mean, I cleaned all the variables so that ls() gives 'character(0)'. – Voltti Nov 26 '19 at 21:20
  • 2
    This has got to be a naming conflict of some sort. Indeed the Windows results are correct. Can you make *absolutely* sure that (1) you start from a fresh R terminal, (2) there are no variables/objects in your global environment from e.g. loading `.Rprofile`, and (3) you are *not* resuming a previous R session. – Maurits Evers Nov 26 '19 at 21:25
  • Maybe also call `stats::lm` explicitly. Did you compile the `SolusOS` version yourself? Under `sessionInfo()` what does each list under "Matrix products"? – MrFlick Nov 26 '19 at 21:29
  • I use `Ubuntu` for what is worth and the results are similar to what he produced on `Windows`. – deepseefan Nov 26 '19 at 21:31
  • I edited the question as I believe the question is not a general `Linux` problem. – deepseefan Nov 26 '19 at 21:49
  • @MauritsEvers Well, this is quite fresh install; I have installed devtools so I could install TestMyCode from Github (I've done the same thing on Windows as well). I had a peek in the only Rprofile I found but that doesn't make too much sense for me, nothing related to my code though as far as I could tell. – Voltti Nov 26 '19 at 22:07
  • @MrFlick I tried calling stats::lm, same wrong results. I didn't compile R or Rstudio, they are available in SolusOS's own repository. Matrix products states "default". – Voltti Nov 26 '19 at 22:18

2 Answers2

4

I posted today on SolusOS forum and I was pointed to this thread. Same issue might affects aov function too and might be OS related (someone reported that has had issue with Ubuntu as well).

Anyways, thanks for help and effort! (I will post a solution if and when it is available)

Update 8th Jan 20

(somewhat copypasted from my dev.getsol.us forum post)

The issue seems to be caused by the OpenBLAS library libopenblas_haswellp-r0.3.2.so. I decided to remove a symbolic link pointing to that library (= /usr/lib64/haswell/libopenblas.so.0), and the R reverted to using /usr/lib64/libopenblas_core2p-r0.3.2.so. Now I get a correct result from my reference calculations.

Of course I have no idea why using libopenblas_haswellp-r0.3.2.so produces the incorrect results, but it seems to be the culprit on my system.

Update 25th Feb 20

Solus has updated OpenBlas package and now the library is /usr/lib64/haswell/libopenblas_haswellp-r0.3.7.so; and it gives the correct results in my reference calculations.

Voltti
  • 96
  • 6
  • thanks for following up. somewhat alarming. seems best to stay clear of that OS for mathematical analysis (at least with R) – user20650 Nov 27 '19 at 18:47
  • Yes, for the time being at least. Other than this issue I really like the distro a lot! – Voltti Nov 27 '19 at 19:19
  • @Voltti That's very odd indeed! It seems that people on the Solus thread are pretty clueless as well (at least at the moment). I find it hard to believe that this is an issue with the Linux kernel. I have just installed Solus (Budgie) 4.0 Fortitude in a VM and `lm` gives the correct result. Can you include details concerning the Solus version, kernel versioin `uname -r` etc.? – Maurits Evers Nov 28 '19 at 01:06
  • @MauritsEvers [Here you Go](https://pastebin.com/XKkHjfUG). I'll try to do some testing next week if I can find clues to this issue...I have a bit of schedule mayhem for the rest of this week...and I don't really have testing setup...yet :) – Voltti Nov 28 '19 at 11:53
  • @Voltti For what it's worth, I've updated my post to include the matching `inxi` output. One of the differences in the kernel version 5.2.x (your Solus OS) vs. 5.3.x (my setup). Perhaps you could try updating the kernel and comparing outputs after the update. – Maurits Evers Nov 28 '19 at 23:41
  • @MauritsEvers Ok, I finally updated the kernel to 5.3.18-140.current (from .2.13-126.current) but that didn't change the results. [(current inxi -info)](https://pastebin.com/mi9ancEc). – Voltti Jan 02 '20 at 16:39
1

The comments are getting a bit unwieldy, so here's a summary and some further suggestions.

To re-iterate, can you please make sure that

  1. you are starting from a fresh R terminal,
  2. there are no objects in your global environment (from e.g. loading your local .Rprofile); to debug this case, ideally .Rprofile should be empty; and
  3. you are not resuming a previous R session.

Provided you did the above, ls() should not return anything, and functions like lm should refer to the base R functions.

If you still get different results, perhaps try calculating the OLS estimates manually

X <- model.matrix(Sepal.Length ~ Sepal.Width + as.factor(Species), data = iris)
y <- with(iris, Sepal.Length)
R <- t(X) %*% X
solve(R) %*% t(X) %*% y
#                                  [,1]
#(Intercept)                  2.2513932
#Sepal.Width                  0.8035609
#as.factor(Species)versicolor 1.4587431
#as.factor(Species)virginica  1.9468166

Compare with the lm estimates

coef(lm(Sepal.Length ~ Sepal.Width + Species, data = iris))
#(Intercept)       Sepal.Width Speciesversicolor  Speciesvirginica
#  2.2513932         0.8035609         1.4587431         1.9468166

If results are different, I'd suggest stepping through the manual OLS estimate calculation and compare e.g. the X and R objects on both machines.


Update

I have installed Solus (Budgie) 4.0 Fortitude in a VM, and lm gives the correct results

coef(lm(Sepal.Length ~ Sepal.Width + Species, data = iris))
#(Intercept)       Sepal.Width Speciesversicolor  Speciesvirginica
#  2.2513932         0.8035609         1.4587431         1.9468166

Details involving the OS

uname -r
#5.3.10-134.current
gcc --version | head -n 1
#gcc (Solus) 9.2.0
inxi -Fz
#System:    Host: solus Kernel: 5.3.10-134.current x86_64 bits: 64 Desktop: Budgie 10.5.1 Distro: Solus 4.0 
#Machine:   Type: Virtualbox System: innotek product: VirtualBox v: 1.2 serial: <filter> 
#           Mobo: Oracle model: VirtualBox v: 1.2 serial: <filter> BIOS: innotek v: VirtualBox date: 12/01/2006 
#CPU:       Topology: Single Core model: Intel Core i5-6600 bits: 64 type: MCP L2 cache: 6144 KiB 
#           Speed: 3312 MHz min/max: N/A Core speed (MHz): 1: 3312 
#Graphics:  Device-1: VMware SVGA II Adapter driver: vmwgfx v: 2.15.0.0 
#           Display: x11 server: X.Org 1.20.5 driver: vmware unloaded: fbdev,modesetting,vesa resolution: 2560x1440~60Hz 
#           OpenGL: renderer: llvmpipe (LLVM 9.0 256 bits) v: 3.3 Mesa 19.2.5 
#Audio:     Device-1: Intel 82801AA AC97 Audio driver: snd_intel8x0 
#           Sound Server: ALSA v: k5.3.10-134.current 
#Network:   Device-1: Intel 82540EM Gigabit Ethernet driver: e1000 
#           IF: enp0s3 state: up speed: 1000 Mbps duplex: full mac: <filter> 
#           Device-2: Intel 82371AB/EB/MB PIIX4 ACPI type: network bridge driver: piix4_smbus 
#Drives:    Local Storage: total: 40.00 GiB used: 7.33 GiB (18.3%) 
#           ID-1: /dev/sda vendor: VirtualBox model: VBOX HARDDISK size: 40.00 GiB 
#Partition: ID-1: / size: 18.36 GiB used: 7.25 GiB (39.5%) fs: ext4 dev: /dev/dm-1 
#           ID-2: /boot size: 269.0 MiB used: 83.7 MiB (31.1%) fs: ext4 dev: /dev/sda1 
#           ID-3: swap-1 size: 956.0 MiB used: 0 KiB (0.0%) fs: swap dev: /dev/dm-0 
#Sensors:   Message: No sensors data was found. Is sensors configured? 
#Info:      Processes: 159 Uptime: 21h 57m Memory: 3.84 GiB used: 579.1 MiB (14.7%) #Shell: bash inxi: 3.0.36
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 1 ) I started R with '--vanilla' flag. 2) The only Rprofile file I found is in /usr/lib64/R/library/R/base. If I wipe that file clean, R will not start. 3) AFAIK no previous session present. ls() gives character(0). I compared the results for two codes and yes, they are different; the latter giving the wrong results. lm function points to "environment: namespace:stats" as it should, I believe and gives out [this (pastebin link)](https://pastebin.com/jvZBs2TT). I will try to compare that code output to the window's one today. – Voltti Nov 27 '19 at 07:39
  • @Voltti Hmm. I'm quite stumped. Based on details from your comment, everything seems to be OK. So just to be clear. On your Linux machine, the `lm` and "manual" OLS results *don't* agree? Whereas they agree on your Windows machine? So in other words, the "manual" OLS estimates on your Linux machine agree with the Windows results? – Maurits Evers Nov 27 '19 at 11:11
  • So the only wrong result is by `lm` function on Linux/SolusOS. The manual OLS method on SolusOS and Windos as well as `lm` on Windows give the exact same, correct result. Yeah, I'm scratching my head hard over this; I spend quite a bit of time yesterday trying to figure what is wrong with my calculations until I managed to convice myself that there's no issue with the code. Though I'm still waiting for that _I'm such un idiot_ type of mistake to pop up somewhere. – Voltti Nov 27 '19 at 14:04
  • I checked the output of `lm` function's code output on both OSes: [link](https://www.diffchecker.com/A5BK5rF4) but only real difference is bytecode, which is to be expected, right...? – Voltti Nov 27 '19 at 14:22
  • Maurits Evers, Do you still have your VM setup up and running? I was wondering if you could run in R the La_version() and La_library() functions? I get "3.8.0" and "/usr/lib64/haswell/libopenblas_haswellp-r0.3.2.so", respectively. – Voltti Jan 03 '20 at 09:17