语言, C/C++专题

Eigen、OpenBLAS、MKL、Numpy的运行效率对比测试

这是之前的两篇:

本篇主要对比 Eigen 本身和链接 OpenBLAS 或 MKL 后 Eigen 的运行效率,同时也和 Python 的 Numpy 计算作为对比,以矩阵求逆为测试例子。

主要结论为:

  • 如果未连接外部高性能线性代数库,Eigen 本身的运行效率较低,同时多核加速效果不明显。
  • 链接 OpenBLAS 或 MKL 后,可以显著提高 Eigen 的运行效率。
  • MKL 的运行效率会比 OpenBLAS 快。
  • OpenBLAS 或 MKL + Eigen 的矩阵求逆速度仍然低于 Python 的 np.linalg.inv()。这是因为 Python (NumPy) 调用的是 MKL 库,同时在接口处也做了更多的优化。

速度排序为:Eigen < Eigen (OpenBLAS) < Eigen (MKL) < Numpy (MKL)。

C++ 测试代码 a.cpp:

#define EIGEN_USE_BLAS  // 注释或取消注释来测试 
// #define EIGEN_USE_MKL_ALL  // 如果使用 MKL,优先用 EIGEN_USE_MKL_ALL
#include <iostream>
#include <chrono>
#include <vector>
#include <iomanip> 
#include <Eigen/Dense>

int main() {
    std::vector<int> sizes = {100, 200, 300, 500, 1000, 2000, 3000, 5000};  // 要测试的不同矩阵大小 
    const int trials = 3;  // 每个尺寸的测试次数

    for (int size : sizes) {
        std::cout << "Testing size: " << size << "x" << size << std::endl;
        
        Eigen::MatrixXd A = Eigen::MatrixXd::Random(size, size);
        A = A.transpose() * A + Eigen::MatrixXd::Identity(size, size);  // 确保矩阵可逆

        auto start = std::chrono::high_resolution_clock::now();
        
        for (int i = 0; i < trials; ++i) {
            Eigen::MatrixXd A_inv = A.inverse();
        }
        
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
        
        std::cout << "Average time per inversion: " 
                  << std::fixed << std::setprecision(3) 
                  << (static_cast<double>(duration.count()) / 1000 / trials) 
                  << " s" << std::endl;
        std::cout << "----------------------------------" << std::endl;
    }

    return 0;
}

未链接外部高性能线性代数库的编译命令:

g++ -std=c++14 a.cpp

链接 OpenBLAS 的编译命令:

g++ -std=c++14 a.cpp -lopenblas

链接 MKL 的编译命令:

 g++ -std=c++14 a.cpp -lmkl_rt

Python 的矩阵求逆 np.linalg.inv() 测试 a.py,用于对比:

import numpy as np
import time

sizes = [100, 200, 300, 500, 1000, 2000, 3000, 5000]
trials = 3

for size in sizes:
    print(f"Testing size: {size}x{size}")
    
    A = np.random.rand(size, size)
    A = A.T @ A + np.eye(size)
    
    start = time.time()
    
    for _ in range(trials):
        A_inv = np.linalg.inv(A)
    
    end = time.time()
    duration = end - start
    
    print(f"Average time per inversion: {duration/trials:.3f} s")
    print("----------------------------------")

以下是单核的运行结果。

未连接外部高性能线性代数库的 Eigen 的运行结果(单核):

Testing size: 100x100
Average time per inversion: 0.022 s
----------------------------------
Testing size: 200x200
Average time per inversion: 0.152 s
----------------------------------
Testing size: 300x300
Average time per inversion: 0.483 s
----------------------------------
Testing size: 500x500
Average time per inversion: 2.456 s
----------------------------------
Testing size: 1000x1000
Average time per inversion: 18.990 s
----------------------------------
Testing size: 2000x2000
Average time per inversion: 149.247 s
----------------------------------
Testing size: 3000x3000
Average time per inversion: 499.806 s
----------------------------------
Testing size: 5000x5000
Average time per inversion: 2298.678 s
----------------------------------

链接 OpenBLAS 的 Eigen 的运行结果(单核):

Testing size: 100x100
Average time per inversion: 0.003 s
----------------------------------
Testing size: 200x200
Average time per inversion: 0.016 s
----------------------------------
Testing size: 300x300
Average time per inversion: 0.023 s
----------------------------------
Testing size: 500x500
Average time per inversion: 0.068 s
----------------------------------
Testing size: 1000x1000
Average time per inversion: 0.297 s
----------------------------------
Testing size: 2000x2000
Average time per inversion: 1.673 s
----------------------------------
Testing size: 3000x3000
Average time per inversion: 4.302 s
----------------------------------
Testing size: 5000x5000
Average time per inversion: 15.269 s
----------------------------------

链接 MKL 的 Eigen 的运行结果(单核):

Testing size: 100x100
Average time per inversion: 0.077 s
----------------------------------
Testing size: 200x200
Average time per inversion: 0.005 s
----------------------------------
Testing size: 300x300
Average time per inversion: 0.009 s
----------------------------------
Testing size: 500x500
Average time per inversion: 0.027 s
----------------------------------
Testing size: 1000x1000
Average time per inversion: 0.140 s
----------------------------------
Testing size: 2000x2000
Average time per inversion: 0.820 s
----------------------------------
Testing size: 3000x3000
Average time per inversion: 2.069 s
----------------------------------
Testing size: 5000x5000
Average time per inversion: 7.359 s
----------------------------------

Python 中 np.linalg.inv() 的运行结果(单核):

Testing size: 100x100
Average time per inversion: 0.002 s
----------------------------------
Testing size: 200x200
Average time per inversion: 0.002 s
----------------------------------
Testing size: 300x300
Average time per inversion: 0.003 s
----------------------------------
Testing size: 500x500
Average time per inversion: 0.010 s
----------------------------------
Testing size: 1000x1000
Average time per inversion: 0.064 s
----------------------------------
Testing size: 2000x2000
Average time per inversion: 0.491 s
----------------------------------
Testing size: 3000x3000
Average time per inversion: 1.415 s
----------------------------------
Testing size: 5000x5000
Average time per inversion: 5.558 s
----------------------------------

以下是八核的运行结果。

未连接外部高性能线性代数库的 Eigen 的运行结果(八核):

Testing size: 100x100
Average time per inversion: 0.027 s
----------------------------------
Testing size: 200x200
Average time per inversion: 0.175 s
----------------------------------
Testing size: 300x300
Average time per inversion: 0.565 s
----------------------------------
Testing size: 500x500
Average time per inversion: 2.224 s
----------------------------------
Testing size: 1000x1000
Average time per inversion: 17.355 s
----------------------------------
Testing size: 2000x2000
Average time per inversion: 135.106 s
----------------------------------
Testing size: 3000x3000
Average time per inversion: 454.176 s
----------------------------------
Testing size: 5000x5000
Average time per inversion: 2093.090 s
----------------------------------

链接 OpenBLAS 的 Eigen 的运行结果(八核):

Testing size: 100x100
Average time per inversion: 0.003 s
----------------------------------
Testing size: 200x200
Average time per inversion: 0.013 s
----------------------------------
Testing size: 300x300
Average time per inversion: 0.022 s
----------------------------------
Testing size: 500x500
Average time per inversion: 0.060 s
----------------------------------
Testing size: 1000x1000
Average time per inversion: 0.229 s
----------------------------------
Testing size: 2000x2000
Average time per inversion: 1.135 s
----------------------------------
Testing size: 3000x3000
Average time per inversion: 2.610 s
----------------------------------
Testing size: 5000x5000
Average time per inversion: 7.531 s
----------------------------------

链接 MKL 的 Eigen 的运行结果(八核):

Testing size: 100x100
Average time per inversion: 0.062 s
----------------------------------
Testing size: 200x200
Average time per inversion: 0.020 s
----------------------------------
Testing size: 300x300
Average time per inversion: 0.010 s
----------------------------------
Testing size: 500x500
Average time per inversion: 0.038 s
----------------------------------
Testing size: 1000x1000
Average time per inversion: 0.132 s
----------------------------------
Testing size: 2000x2000
Average time per inversion: 0.529 s
----------------------------------
Testing size: 3000x3000
Average time per inversion: 1.215 s
----------------------------------
Testing size: 5000x5000
Average time per inversion: 3.517 s
----------------------------------

Python 中 np.linalg.inv() 的运行结果(八核):

Testing size: 100x100
Average time per inversion: 0.025 s
----------------------------------
Testing size: 200x200
Average time per inversion: 0.021 s
----------------------------------
Testing size: 300x300
Average time per inversion: 0.003 s
----------------------------------
Testing size: 500x500
Average time per inversion: 0.006 s
----------------------------------
Testing size: 1000x1000
Average time per inversion: 0.021 s
----------------------------------
Testing size: 2000x2000
Average time per inversion: 0.129 s
----------------------------------
Testing size: 3000x3000
Average time per inversion: 0.298 s
----------------------------------
Testing size: 5000x5000
Average time per inversion: 1.008 s
----------------------------------
9 次浏览

【说明:本站主要是个人的一些笔记和代码分享,内容可能会不定期修改。为了使全网显示的始终是最新版本,这里的文章未经同意请勿转载。引用请注明出处:https://www.guanjihuan.com

发表评论

您的邮箱地址不会被公开。 必填项已用 * 标注

Captcha Code