1

I am trying to build an R package using mlpack. As suggested in this link I am using the following cpp function

#include <Rcpp/Rcpp>
#include <mlpack.h>

// Two include directories adjusted for my use of mlpack 3.4.2 on Ubuntu
#include <mlpack/core.hpp>
#include <mlpack/methods/kmeans/kmeans.hpp>
#include <mlpack/methods/kmeans/random_partition.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(mlpack)]]

// This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
// and src/mlpack/tests/kmeans_test.cpp. We borrow the data set, and the
// code from the first test function. Passing data from R in easy thanks
// to RcppArmadillo, 'and left as an exercise'.

// Generate dataset; written transposed because it's easier to read.
arma::mat kMeansData("  0.0   0.0;" // Class 1.
                     "  0.3   0.4;"
                     "  0.1   0.0;"
                     "  0.1   0.3;"
                     " -0.2  -0.2;"
                     " -0.1   0.3;"
                     " -0.4   0.1;"
                     "  0.2  -0.1;"
                     "  0.3   0.0;"
                     " -0.3  -0.3;"
                     "  0.1  -0.1;"
                     "  0.2  -0.3;"
                     " -0.3   0.2;"
                     " 10.0  10.0;" // Class 2.
                     " 10.1   9.9;"
                     "  9.9  10.0;"
                     " 10.2   9.7;"
                     " 10.2   9.8;"
                     "  9.7  10.3;"
                     "  9.9  10.1;"
                     "-10.0   5.0;" // Class 3.
                     " -9.8   5.1;"
                     " -9.9   4.9;"
                     "-10.0   4.9;"
                     "-10.2   5.2;"
                     "-10.1   5.1;"
                     "-10.3   5.3;"
                     "-10.0   4.8;"
                     " -9.6   5.0;"
                     " -9.8   5.1;");


// [[Rcpp::export]]
arma::Row<size_t> kmeansDemo() {

    mlpack::kmeans::KMeans<mlpack::metric::EuclideanDistance, 
                           mlpack::kmeans::RandomPartition> kmeans;

    arma::Row<size_t> assignments;
    kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);

    return assignments;
}

If I sourceCpp the above in Ubuntu linux Sys.setenv("PKG_LIBS"="-lmlpack") then it compiles successfully. However, I am unable to use it on macOS with Apple M2 architecture. I am getting the following error in macOS

/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/mlpack/include/mlpack.h:52:10: fatal error: mlpack/core.hpp: No such file or directory
   52 | #include <mlpack/core.hpp>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated. 

I have installed mlpack R package installed as well as the system mlpack using brew. Seems to me that R cannot link to the mlpack libraries that are located in /opt/homebrew/include/ in my system. Is there a way to link to these? I have tried brew link mlpack which shows linking is successful but still got the same compilation error. Additionally I tried the following in R before sourceCpping but no luck!

Sys.setenv("LDFLAGS"="-L/opt/homebrew/lib")
Sys.setenv("CPPFLAGS"="-I/opt/homebrew/include")
Sys.setenv("PKG_LIBS"="-lmlpack")

Please let me know if there is any way out for this in macOS.

P.S. Both R and Rstudio are installed in my system using brew.

  • Tagged R and C++. Question title says Rcpp. Which of the three languages is this, really? Just tag that one – user4581301 May 25 '23 at 23:25
  • @user4581301 R and C++, via Rcpp. It's ok. We have about 3000 questions in the `[rcpp]` tag. – Dirk Eddelbuettel May 25 '23 at 23:48
  • Good question. What I would do at this point is to try to check the viability of the compiler and linker R uses and the mlpack library, ie take the little `kmeans` test function, wrap a `main()` around and then do the equivalent of `gcc -o kmeanstest kmeanstest.cpp -L/opt/homebrew/lib -lmlpack [plus whatever else you need]`. We can generally move from a minimally viable example to one involving R. But I am not on macOS so I can never remember if the `brew` installed do or do not mesh with what R uses. I do know Simon Urbanek recommends the toolchain from CRAN, not brew. – Dirk Eddelbuettel May 25 '23 at 23:51
  • I tried [this minimally viable example](https://github.com/mlpack/mlpack/blob/master/doc/quickstart/cpp.md) and was able to successfully compile it using `g++ -O3 -std=c++14 -o cpp_quickstart_1 cpp_quickstart_1.cpp -L/opt/homebrew/lib/ -larmadillo`. However, it does not require the link `-lmlpack`. – noirritchandra May 26 '23 at 00:09
  • Perfect! That is meant to imply in my last answer too: mlpack 4.* is header-only, the one I had on my box was 3.4.2 so _I needed `-lmlpack`_. So here you likely do without `-lmlpack`. When used from R we also do not need `-larmadillo` (as LAPACK etc come from R). So try a similar minimal example and check the compiler flags give out by R (use `sourceCpp()` in verbose mode). – Dirk Eddelbuettel May 26 '23 at 00:16
  • I have mlpack 4.10 installed in my Ubuntu system where I was able to `sourceCpp` the simple example you provided **with** `Sys.setenv("PKG_LIBS"="-lmlpack")`. But **without specifying** the `PKG_LIBS` it does not compile and I get `Error in dyn.load("/tmp/RtmphLvm2V/sourceCpp-x86_64-pc-linux-gnu-1.0.10/sourcecpp_1364e4dfa4d09/sourceCpp_2.so") :`. In ubuntu I am able to create `R` package by adding `-lmlpack` in `PKG_LIBS`. But I think this is just a lucky hack and not a principled solution. Nothing works in macOS though where the `mlpack` version is 4.1.0 as well. – noirritchandra May 26 '23 at 00:53
  • Ok so my hope was wrong and maybe we still do need linking. In any event, depending on an _external_ library that may or may not be preset is not an easy problem, and there is nothing `Rcpp` can do about. But among the over 2600 packages at CRAN many do this including eg my packages `RQuantLib` or `RProtoBuf`. – Dirk Eddelbuettel May 26 '23 at 12:16
  • I see. If the issue you have mentioned in the other discussion thread can be implemented in future, then I think the scope of the `mlpack` library can be extended a lot more. – noirritchandra May 26 '23 at 16:17
  • We are working on this now 'upstream' in (source package) `mlpack` so that the (R package) `mlpack` will install the headers going forward. You pointed out an actual isssue here, and we'll address it. – Dirk Eddelbuettel May 31 '23 at 11:37
  • `mlpack` 4.2.0 is now on CRAN and should help you. Thanks again for the heads-up! – Dirk Eddelbuettel Jun 25 '23 at 16:30

1 Answers1

1

mlpack 4.2.0 is now on CRAN and ships exported headers we can use! A minimally modified version of your example follows.

Code

#include <Rcpp/Rcpp>
#include <mlpack.h>

#include <mlpack/methods/kmeans.hpp>

// -- use C++17
// [[Rcpp::plugins(cpp17)]]
// -- use Armadillo, Ensmallen and mlpack headers
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppEnsmallen)]]
// [[Rcpp::depends(mlpack)]]

// This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp

// Generate dataset; written transposed because it's easier to read.
arma::mat kMeansData("  0.0   0.0;" // Class 1.
                     "  0.3   0.4;"
                     "  0.1   0.0;"
                     "  0.1   0.3;"
                     " -0.2  -0.2;"
                     " -0.1   0.3;"
                     " -0.4   0.1;"
                     "  0.2  -0.1;"
                     "  0.3   0.0;"
                     " -0.3  -0.3;"
                     "  0.1  -0.1;"
                     "  0.2  -0.3;"
                     " -0.3   0.2;"
                     " 10.0  10.0;" // Class 2.
                     " 10.1   9.9;"
                     "  9.9  10.0;"
                     " 10.2   9.7;"
                     " 10.2   9.8;"
                     "  9.7  10.3;"
                     "  9.9  10.1;"
                     "-10.0   5.0;" // Class 3.
                     " -9.8   5.1;"
                     " -9.9   4.9;"
                     "-10.0   4.9;"
                     "-10.2   5.2;"
                     "-10.1   5.1;"
                     "-10.3   5.3;"
                     "-10.0   4.8;"
                     " -9.6   5.0;"
                     " -9.8   5.1;");


// [[Rcpp::export]]
arma::Row<size_t> kmeansDemo() {

    // Originally written to use RandomPartition, and is left that
    // way because RandomPartition gives better initializations here.
    mlpack::KMeans<mlpack::EuclideanDistance, mlpack::RandomPartition> kmeans;

    // mlpack::KMeans<> kmeans;    // default arguments as an alternative

    arma::Row<size_t> assignments;
    kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);

    return assignments;
}

/*** R
kmeansDemo()
*/

Output

> Rcpp::sourceCpp("~/git/stackoverflow/76336745/answer.cpp")

> kmeansDemo()
[INFO ] KMeans::Cluster(): iteration 1, residual 13.7285.
[INFO ] KMeans::Cluster(): iteration 2, residual 2.51215e-15.
[INFO ] KMeans::Cluster(): converged after 2 iterations.
[INFO ] 186 distance calculations.
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
[1,]    2    2    2    2    2    2    2    2    2     2     2     2     2     0     0     0     0
     [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,]     0     0     0     1     1     1     1     1     1     1     1     1     1
> 

Packages

> sapply(c("RcppArmadillo", "RcppEnsmallen", "mlpack"), \(x) format(packageVersion(x)))
RcppArmadillo RcppEnsmallen        mlpack 
 "0.12.4.1.0"  "0.2.19.0.1"       "4.2.0" 
> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • I have verified that these are now working on both macOS and Ubuntu environments. Thank you very much Prof. Eddelbuettel for resolving this matter promptly. I really appreciate it. – noirritchandra Jun 26 '23 at 20:28
  • Oh I am mostly just the middle man and messenger here. The kudos goes to the (often Google Summer of Code students) `mlpack` contributors updating the R bindings, to the whole `mlpack` team for the 4.* releases ... and to *you* for reminding us that the R package failed to install these headers! So we all did this together, and it is actually rather awesome that is all works now. It was a looooooong time coming! – Dirk Eddelbuettel Jun 26 '23 at 20:35