Allow for the Multi-threads of XGBoost within Conda environments under MacOS

Published in

Geek Culture

7 min readJun 15, 2021

In this post I will illustrate the somewhat subtle issue about allowing for the multi-threads of Xgboost within Conda environments under MacOS. Since the birth of evolutionary paper written by Chen, Tianqi, and Carlos Guestrin (2016), XGBoost has been drawing attention and is increasingly popular in academic researched, industrial applications, and main competitions, like Kaggle. It is highly recommended to read through its home page about the basic ideas, and one can find its advantages over other ensemble learning algorithms. Generally speaking, different from the traditional popular random forests generating the averaged scores from the independent trees growing from random subsets of samples and random subset of features respectively, XGBoost is just trying refining the prediction outcomes by growing sequential trees. Within the training process, one can try adding regularizations on the number of iterations (trees) by moderating early stopping criterion and the learning rate and the complexity of each tree (depth, number of leaves, weights, and etc.) to control for overfitting. Generally speaking, model training process usually requires from dozens of to hundreds of times of repeated trials, basically for the selection of best set of hyperparameters and the selection of the best set of features. Therefore, the speed issues are always important. Luckily, XGBoost is the optimized high-speed algorithm especially utilizing multi-threads automatically (of coz, GPU support is also available). However, I just found the subtle issue that XGBoost R package installed under MacOS seems NOT to employ multi-threads and only single-thread. I also raise the discussion here: https://github.com/dmlc/xgboost/issues/7017

Official Solution for System-wide R

Actually, the developers realized this issue at the early time and then generate the sort of simple solution to the issue. The tricky point is just to guarantee libomp is installed in MacOS in advance. Please check https://xgboost.readthedocs.io/en/latest/build.html#installing-the-development-version-linux-mac-osx. Therefore, just use brew

brew install libomp cmake

and then follow the instructions to install XGBoost R package from source. For example, just navigate to an arbitrary temporary directory and then input the following in the terminal

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost 
git submodule init 
git submodule update 
mkdir build 
cd build 
cmake .. -DR_LIB=ON 
make 
make install

Now one can check the following experiments in R and see now XGBoost can support multi-threads:

# test number of threadsrequire(xgboost)
x <- matrix(rnorm(100 * 10000), 10000, 100)
y <- x %*% rnorm(100) + rnorm(1000)system.time({
  bst <- xgboost(data = x, label = y, nthread = 1, nround = 100, verbose = F)
})#   user  system elapsed
# 19.257   0.111  17.062system.time({
  bst <- xgboost(data = x, label = y, nthread = 4, nround = 100, verbose = F)
})#   user  system elapsed
# 17.632   0.056   4.450

Solution within Conda environments

The above solution is only about installing XGBoost R package for system-wide R under MacOS (big sur), I mainly talk about installing XGBoost R package within some conda environment. Generally speaking, python is often used within the conda environments in most of the cases. On the hand, R can also be installed and set up within conda environments. Actually, several benefits are raised for the use of R within the conda environments.

Isolation: Within the conda environment problems can be always tested without the influence on the system, since the whole conda environment can be safely removed. On the other hand, the R/Python packages are also stored within the conda environment, without the potential dependencies on the packages outside the environment.
MKL Acceleration: R employs its default BLAS for matrix computation whose speed is not satisfactory: https://csantill.github.io/RPerformanceWBLAS/ and Intel MKL library is the optimized BLAS/LAPACK for matrix computation. However, it is not that straightforward to link MKL when using system-wide R installed from R website. Currently, the conda environment can setup MKL for R given the settings in dependencies the following ymlfile for example:

name: R_4.0_mkl
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.8
  - conda-forge::r-base=4.1.0
  - conda-forge::libblas=3.9.0=9_mkl

Reproducibility: This is right and wrong. If the R packages are totally installed using conda, then all the R packages can be exported as ymlfile for the use of coworkers. However, installation using traditional install.packages() is usually preferred, especially for compilation purpose. The packages installed using such traditional way cannot be included and shown in the yml however.

The question is whether XGBoost R package installed within conda environments can allow for multi-threads. One can also refer to my reports in https://github.com/dmlc/xgboost/issues/7017. In general, two methods are available for the users to install XGBoost R package within conda environment. One is to go into R and then use install.packages("xgboost") and the other is to use conda install -c conda-forge r-xgboost in terminal after activating the environment. However, both methods can only install XGBoost R package with single-thread available, even though libomp is installed already. Googling can help little about this issue, so I just go to check the compilation make files. When activating such R environment, just input the following in R console:

file.path(R.home("etc"), "Makeconf")

One can find the path about the file of make configuration within the conda environment. Just open such file using your favorate editor, and find

SHLIB_OPENMP_CFLAGS = -fopenmp
SHLIB_OPENMP_CXXFLAGS = -fopenmp
SHLIB_OPENMP_FFLAGS = -fopenmp

which are expected, but the followings are empty

SHLIB_CFLAGS = 
SHLIB_CXXFLAGS = 
SHLIB_FFLAGS =

From my trials and experiments, SHLIB_OPENMP_* are NOT called as effective flags for compilation for XGBoost R package. It is also NOT certain whether other packages requesting compilation call them properly. Since libomp is installed in system-wide and llvm-openmpis also installed automatically within the conda environment with R in it, always adding -fopenmp flag should not be harmful. Therefore, just revise the file by adding the flag to such three empty lines:

SHLIB_CFLAGS = -fopenmp
SHLIB_CXXFLAGS = -fopenmp
SHLIB_FFLAGS = -fopenmp

Now try install.packages("xgboost") in R within the conda environment. Please note that the following information is still found during the compilation process:

checking whether OpenMP will work in a package... no
*****************************************************************************************
         OpenMP is unavailable on this Mac OSX system. Training speed may be suboptimal.
         To use all CPU cores for training jobs, you should install OpenMP by runningbrew install libomp
*****************************************************************************************

However, -fopenmpis also found in the flags during the installation process. After the installation, the following experiments should indicate that OpenMP is in use:

r$> require(xgboost)
    x <- matrix(rnorm(100 * 10000), 10000, 100)
    y <- x %*% rnorm(100) + rnorm(1000)system.time({
      bst <- xgboost(data = x, label = y, nthread = 1, nround = 100, verbose = F)
    })
Loading required package: xgboost
   user  system elapsed
 19.429   0.130  17.317r$> system.time({
      bst <- xgboost(data = x, label = y, nthread = 4, nround = 100, verbose = F)
    })
   user  system elapsed
 17.949   0.063   4.538r$> system.time({
      bst <- xgboost(data = x, label = y, nthread = 8, nround = 100, verbose = F)
    })
   user  system elapsed
 27.401   0.094   3.457

Other Issues

What about XGBoost Python Package in Conda environment under MacOS

Interestingly, XGBoost Python package in conda environment under MacOS seems to be compiled correctly and multi-threads are available. Generally speaking, XGBoost Python package can be installed via

conda install -c conda-forge xgboost

Here is the test for XGBoost Python package. Within some conda environment for Python, numpy and xgboostare installed:

conda install -c conda-forge numpy libblas=3.9.0=9_mkl
conda install -c conda-forge xgboost

Then in ipython:

In [1]: import numpy as np
   ...: import xgboost as xgb
   ...: import timeit
   ...:
   ...: data = np.random.rand(10000, 100)
   ...: label = np.random.randint(2, size=10000)
   ...: dtrain = xgb.DMatrix(data, label=label)
   ...:
   ...: param_1 = {'objective': 'binary:logistic', 'nthread': 1, 'eval_metric': 'auc'}
   ...:
   ...: param_4 = {'objective': 'binary:logistic', 'nthread': 4, 'eval_metric': 'auc'}
   ...:
   ...: param_8 = {'objective': 'binary:logistic', 'nthread': 8, 'eval_metric': 'auc'}
   ...:
   ...: num_round = 100In [2]: start = timeit.default_timer()
   ...:
   ...: xgb.train(param_1, dtrain, num_round)
   ...:
   ...: stop = timeit.default_timer()
   ...:
   ...: print('Time: ', stop - start)
Time:  16.160123399In [3]: start = timeit.default_timer()
   ...:
   ...: xgb.train(param_4, dtrain, num_round)
   ...:
   ...: stop = timeit.default_timer()
   ...:
   ...: print('Time: ', stop - start)
Time:  4.242956155000002In [4]: start = timeit.default_timer()
   ...:
   ...: xgb.train(param_8, dtrain, num_round)
   ...:
   ...: stop = timeit.default_timer()
   ...:
   ...: print('Time: ', stop - start)
Time:  3.200284463999999

What about XGBoost Python/R Package in Linux System

Luckily, XGBoost Python/R package installed within conda environment under Linux system is compiled properly and multi-threads are fine to use, based on my tests on Garuda Linux. Please check the basic information for R conda environment under MacOS

r$> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur 11.3.1Matrix products: default
BLAS/LAPACK: /Users/mm22204/opt/miniconda3/envs/R_4.0_mkl/lib/libmkl_rt.dyliblocale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   baseother attached packages:
[1] xgboost_1.4.1.1loaded via a namespace (and not attached):
[1] compiler_4.1.0    magrittr_2.0.1    Matrix_1.3-4      grid_4.1.0
[5] data.table_1.14.0 jsonlite_1.7.2    lattice_0.20-44

Summary

This post I just share the compilation issue about XGBoost under MacOS for the availability of multi-threads. Actually, it is the subtle problem ONLY for XGBoost R package under MacOS, and it is NOT the problem for XGBoost Python package or under Linux system. In recent years I mainly use MacOS as the balanced choice to employ the merits about unix-like system and gain the access to several softwares for works and lives, such as Microsoft Office. However, indeed compilation problems seem to be special for MacOS sometimes and revisions of the corresponding make files are inevitable. Nowadays, it is usual to apply separate environments to distinct projects to make sure the independence and reproducibility, and conda is the usual choice for data science since it support Python and R inherently. However, we still need to be careful about whether the packages are compiled properly to utilize the resources for computation. Hope this can be the example for your reference.

References

Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).