PAC 2022
Last updated
Last updated
Generalized Plasmon-pole模型第一性原理计算GW估计的一种通用模型,应用于量子计算、能量存储与转换、光伏、纳米电子学等领域。其计算核心算法如图所示,计算复数的矩阵向量相关性,每个单元是一个类似张量计算的累加计算,算法的实现是四重循环, 计算的复杂度高,且适合GPU计算。
source /opt/intel/oneapi/setvars.sh
-------->
:: initializing oneAPI environment ...
-bash: BASH_VERSION = 5.0.17(1)-release
args: Using "$@" for setvars.sh arguments:
:: advisor -- latest
:: ccl -- latest
:: clck -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: inspector -- latest
:: intelpython -- latest
:: ipp -- latest
:: ippcp -- latest
:: ipp -- latest
:: itac -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vpl -- latest
:: vtune -- latest
:: oneAPI environment initialized ::
clinfo -l
-------->
Platform #0: Intel(R) FPGA Emulation Platform for OpenCL(TM)
`-- Device #0: Intel(R) FPGA Emulation Device
Platform #1: Intel(R) OpenCL
`-- Device #0: Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz
Platform #2: Intel(R) OpenCL HD Graphics
+-- Device #0: Intel(R) Graphics [0x020a]
`-- Device #1: Intel(R) Graphics [0x020a]
或
sycl-ls
-------->
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz 3.0 [2022.13.3.0.16_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x020a] 3.0 [22.18.023111]
[opencl:gpu:3] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x020a] 3.0 [22.18.023111]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x020a] 1.3 [1.3.23111]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x020a] 1.3 [1.3.23111]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
sudo /usr/bin/intel_gpu_top -d pci:card=1
sudo /usr/bin/intel_gpu_top -d pci:card=2
-------->
intel-gpu-top - 0/ 0 MHz; 0% RC6; 0 irqs/s
IMC reads: ------ (null)/s
IMC writes: ------ (null)/s
ENGINE BUSY MI_SEMA MI_WAIT
Blitter/0 0.00% | | 0% 0%
Blitter/1 0.00% | | 0% 0%
Video/0 0.00% | | 0% 0%
Video/1 0.00% | | 0% 0%
Video/2 0.00% | | 0% 0%
Video/3 0.00% | | 0% 0%
Video/4 0.00% | | 0% 0%
Video/5 0.00% | | 0% 0%
Video/6 0.00% | | 0% 0%
Video/7 0.00% | | 0% 0%
Video/8 0.00% | | 0% 0%
Video/9 0.00% | | 0% 0%
Video/10 0.00% | | 0% 0%
Video/11 0.00% | | 0% 0%
Video/12 0.00% | | 0% 0%
Video/13 0.00% | | 0% 0%
VideoEnhance/0 0.00% | | 0% 0%
VideoEnhance/1 0.00% | | 0% 0%
VideoEnhance/2 0.00% | | 0% 0%
VideoEnhance/3 0.00% | | 0% 0%
VideoEnhance/4 0.00% | | 0% 0%
VideoEnhance/5 0.00% | | 0% 0%
VideoEnhance/6 0.00% | | 0% 0%
VideoEnhance/7 0.00% | | 0% 0%
Compute/0 0.00% | | 0% 0%
Compute/1 0.00% | | 0% 0%
Compute/2 0.00% | | 0% 0%
Compute/3 0.00% | | 0% 0%
Compute/4 0.00% | | 0% 0%
Compute/5 0.00% | | 0% 0%
Compute/6 0.00% | | 0% 0%
Compute/7 0.00% | | 0% 0%
PID NAME
clinfo | grep -E 'units|frequency|memory size'
-------->
Max compute units 64
Max clock frequency 2900MHz
Global memory size 270139891712 (251.6GiB)
Local memory size 262144 (256KiB)
Max compute units 64
Max clock frequency 2900MHz
Global memory size 270139891712 (251.6GiB)
Local memory size 32768 (32KiB)
Max compute units 960
Max clock frequency 1400MHz
Global memory size 32482365440 (30.25GiB)
Local memory size 65536 (64KiB)
Max compute units 960
Max clock frequency 1400MHz
Global memory size 32482365440 (30.25GiB)
Local memory size 65536 (64KiB)