矩阵算法

操作系统 云服务/平台 技术难度 关注领域
Android   Intermediate Low Power        Gaming       Embedded

任务目标

当运行某些算法的时候,我们需要测试硬件处理器的性能。也许我们能够运行自己的算法在适合的处理器。或者将我们的算法拆分,然后放到不同的处理器上运行,也许能够获得更好的性能 我希望提供一个解决方案,用来优化应用和设备性能。

所需材料/所需清单/工具

  • Snapdragon Heterogeneous Compute SDK v1.0.0 - Linux

  • android-ndk-r14b-linux-x86_64

源码/示例/可执行的应用程序

  • Source Code

附加资料

  • Video Links(网盘密码:bzsf )

构建/装配说明

以下展示了在这个项目中使用到的部分。

1. 使用SDM845平台的Android设备,并且建立HetCompute SDK,Hexagon SDK和应用。

2.Ubuntu 18.04LTS

3.Type-C date line.

4.所有开发工作都基于这个HetComputeSDK_demo

 

部署项目

1. Download Snapdragon Heterogeneous Compute SDK from https://developer.qualcomm.com/download/snapdragon-heterogeneous-compute-sdk-1.0.0.deb?referrer=node/35864, and install it to PC.

2. Download android-ndk-r14b-linux-x86_64 from https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip, and install it to PC

3. Download Hexagon DSP SDK from https://developer.qualcomm.com/download/hexagon/hexagon-sdk-v3-3-3-linux.zip?referrer=node/6116, and install it to PC.

4. 编译然后复制应用和库到设备。编译方法可以参考 README.md文件

5. Push application to /data/local/tmp and running: ./hetcompute_sample_buffe_kernel_matrix_CPU_GPU_DSP 1000 100 1 ./hetcompute_sample_buffe_kernel_matrix_CPU_GPU_DSP 1000 100 2 (1000 is create a array[1000], 100 is calculate loop count, 1 or 2 is work mode )

6.如果没有问题,可以上传你的code到github

 

工作流程

一、开始应该做些什么?

首先我们需要创建处理器内核通道用于控制硬件

/1.0.0/samples/buffe_kernel_matrix_CPU_GPU_DSP.cc

 

Create CPU kernel pipe

oid run_CPU(hetcompute::buffer_ptr<const int> matrixA, 

             hetcompute::buffer_ptr<int> matrixB)

{

    unsigned long begin_process_time = 0;

    unsigned long end_process_time = 0;

    int arrayLen = matrixA.size();

 

    int* input_data = new int[arrayLen * sizeof(int)];

    int* output_data = new int[arrayLen * sizeof(int)];

 

    matrixA.acquire_ro();

    for (size_t index = 0; index < arrayLen; index++) {

        input_data[index] = matrixA[index];

    }

    matrixA.release();

    // The CPU kernel infers the access directions

    auto cg = hetcompute::create_group("Calculate array value");

 

    for (size_t x = 0; x < loop_number; x++) {

        // create a CPU parallel calc by HetComputeSDK

        cg->launch([arrayLen, input_data, &output_data] {

            for (size_t y = 0; y < arrayLen; y++) {

                // matrix addition

                output_data[y] = input_data[y] + input_data[y];

                // matrix multiplication

                output_data[y] = input_data[y] * input_data[y];

            }

        });

    }

 

    begin_process_time = getCurrentTimeMsec();

    cg->wait_for();

    end_process_time = getCurrentTimeMsec();

    process_calc_time_cpu += (end_process_time - begin_process_time);

 

    delete [] input_data;

    delete [] output_data;

}

Create GPU kernel pipe

void run_GPU(hetcompute::buffer_ptr<const int> matrixA,

             hetcompute::buffer_ptr<int> matrixB)

{

    unsigned long begin_process_time = 0;

    unsigned long end_process_time = 0;

 

    auto arrayLen = hetcompute::create_buffer<int>(loop_number);

    arrayLen.acquire_wi();

    for (size_t x = 0; x < loop_number; x++) {

        arrayLen[x] = matrixA.size();

    }

    arrayLen.release();

 

    // create GPU kernel

    auto gk = hetcompute::create_gpu_kernel<hetcompute::buffer_ptr<const int>,

                                            hetcompute::buffer_ptr<int>,

                                            hetcompute::buffer_ptr<const int>>

                                            (matrix_kernel_string, "user_function_gpu");

 

    // launch GPU kernel over 1D range

    auto t = hetcompute::launch(gk, hetcompute::range<1>(loop_number), matrixA, matrixB, arrayLen);

 

    begin_process_time = getCurrentTimeMsec();

    t->wait_for();

    end_process_time = getCurrentTimeMsec();

 

    process_calc_time_gpu += (end_process_time - begin_process_time);

}

 

Create DSP kernel pipe

void run_DSP(hetcompute::buffer_ptr<const int> matrixA,

             hetcompute::buffer_ptr<int> matrixB)

{

    unsigned long begin_process_time = 0;

    unsigned long end_process_time = 0;

 

    auto dg = hetcompute::create_group();

    auto dk = hetcompute::create_dsp_kernel<>(hetcompute_dsp_matrix_buffer);

 

    for (size_t x = 0; x < loop_number; x++) {

        dg->launch(dk, matrixA, matrixB);

    }

 

    begin_process_time = getCurrentTimeMsec();

    dg->wait_for();

    end_process_time = getCurrentTimeMsec();

 

    process_calc_time_dsp += (end_process_time - begin_process_time);

}

贡献者信息

姓名 公司

Shen Tao

shentao1012@thundersoft.com
Thundersoft

Yang Rong

yangrong0925@thundersoft.com
Thundersoft

Kou Zhiwu

kouzw0723@thundersoft.com
Thundersoft

>>浏览更多Qualcomm硬件案例:http://qualcomm.csdn.net/m/zone/qualcomm2016/project

Qualcomm 解决方案

 

XR

Qualcomm XR专区是 Qualcomm和CSDN联合建立的聚焦增强现实(AR)、虚拟现实(VR)等技术的技术专区。本专区将为开发者打造一流的开发环境,提供丰富的技术支持,和业界资讯,以及最全面的下载资料。让开发者感受非凡移动体验、带来身临其境的移动享受、感受精彩生活、无限接近逼真视觉提供支持,打造一个全面的移动开发者技术服务社区。

了解更多

SDK 下载

本版块下载 SDK,只需简单注册,就可轻松下载。