🍪
cookielau
  • Introduction
  • Machine Learning
    • Distributed
      • Bookmarks
    • NLP
      • Transformers
    • MLC
      • Tensor Program Abstraction
      • End-to-End Module Execution
  • Framework
    • PyTorch
      • Bookmarks
      • Model
      • Shared
      • Miscellaneous
    • Tensorflow
      • Bookmarks
      • Model
      • Shared
      • Miscellaneous
    • CUDA
      • Bookmarks
    • DeepSpeed
    • Bagua
      • Model
      • Optimizer
    • Others
      • Bookmarks
  • About Me
    • 2022-04-28
  • Random Thoughts
  • Archives
    • CPP
      • Bookmarks
      • Container
      • Algorithm
      • FILE CONTROL
      • Virtual Table
      • Assembly
      • Key Words
      • Problems
      • Others
    • JAVA
      • String Container
      • Maps
    • PYTHON
      • Bookmarks
      • Python Tools
        • Batch Rename
        • Combine Excel
        • Excel Oprations
        • Read Write Excel
        • Rotate PDF
      • Library
        • Pandas Notes
        • Numpy Notes
        • Json Notes
      • Spider
        • Selenium Install
        • Selenium Locating
        • Selenium Errors
        • Selenium Basics
      • Django
        • Start Up
      • Others
    • LINUX
      • Installation
      • Cli Tools
      • WSL
      • Bugs
    • JUNIOR2
      • Economics
        • Chapter 0x01 经济管理概述
        • Chapter 0x02 微观市场机制分析
        • Chapter 0x03 生产决策与市场结构
        • Chapter 0x04 宏观经济市场分析
        • Chapter 0x05 管理的职能
        • Chapter 0x06 生产系统结构与战略
        • Chapter 0x0b 投资项目经济评价
        • Chapter 0x0f 投资项目经济评价
      • Computer Network
        • 概述
        • 分层模型
        • 物理层
        • 数据链路层
        • 网络层
        • 传输层
        • 应用层
        • HTTP(s)实验
        • [Practice]
      • Software Engineering
        • Introduction
        • Demand Analysis
        • Task Estimation
        • Presentation
      • Network Security
        • Chapter 0x01 概述
        • Chapter 0x02 密码学
        • Chapter 0x03 公钥体制
        • Chapter 0x04 消息认证
        • Chapter 0x05 密钥管理
        • Chapter 0x06 访问控制
        • Assignments
      • x86 Programming
        • Basic Knowledge
        • Program Design
        • System Interruption
        • Frequently used functions
    • MD&LaTex
      • Markdown
      • LaTex
    • NPM
      • NPM LINK
    • MyBlogs
      • 2020BUAA软工——“停下来,回头看”
      • 2020BUAA软工——“初窥构建之法”
      • 2020BUAA软工——“上手软件工程,PSP初体验!”
      • 2020BUAA软工——“深度评测官”
      • 2020BUAA软工——“并肩作战,平面交点Pro”
    • SC
      • PAC 2022
        • Lectures
      • OpenMP & MPI
        • MPI Overview
        • Message Passing Programming
        • OpenMP Overview
        • Work Sharing Directives
        • Annual Challenge
        • Future Topics in OpenMP
        • Tasks
        • OpenMP & MPI
    • Hardware
      • Nvidia GPU
        • Frequent Error
        • Memory Classification
        • CUDA_7_Streams_Simplify_Concurrency
        • Optimize_Data_Transfers_in_CUDA
        • Overlap_Data_Transfers_in_CUDA
        • Write_Flexible_Kernels_with_Grid-Stride_Loops
        • How_to_Access_Global_Memory_Efficiently
        • Using_Shared_Memory
      • Intel CPU
        • Construction
        • Optimization
        • Compilation
        • OpenMP
    • English
      • Vocab
      • Composition
    • Interview
      • Computer Network
Powered by GitBook
On this page

Was this helpful?

  1. Archives
  2. SC
  3. OpenMP & MPI

Future Topics in OpenMP

PreviousAnnual ChallengeNextTasks

Last updated 2 years ago

Was this helpful?

Nested parallelism:

  • enabled with OMP_NESTED environment variable or the omp_set_nested(1) routine.

  • If a PARALLEL directive is encountered within another PARALLEL directive, a new term of threads will be created.

  • The new team will contain only one thread unless nested parallelism is enabled.

NUM_THREADS clause:

  • One way to control the number of threads used at each level is with the num_threads(integer) clause:

    #pragma omp parallel for num_threads(4)
    for (int i = 0; i < N; i++) {
        #pragma omp parallel for num_threads(total_threads/4)
        for (int j = 0; j < M; j++) {
            // do computations
        }
    }
  • The value set in the clause supersedes the value in the environment variable OMP_NUM_THREADS (or that set by omp_set_num_threads(integer))

Orphaned directives:

  • Directives are active in the dynamic scope of a parallel region, not just its lexical scope. e.g.

    void foo() {
        // do computations
    }
    #pragma omp parallel
    {
        foo()
    }
  • Useful as it allows a modular programming style

  • But also can be confusing if the call tree is complicated.

  • Extra rules about data scope attributes:

    • Variables in the argument list inherit their data scope attribute from the calling routine.

    • Global variables in C++ and COMMON blocks or module variables in Fortran are shared, unless declared THREADPRIVATE.

    • static local variables in C/C++ and SAVE variables in Fortran are shared.

    • All other local variables are private.

  • Binding rules:

    • DO/FOR, SECTIONS, SINGLE, MASTER and BARRIER directives always bind to the nearest enclosing PARALLEL directive.

  • Thread private global variables:

    • It can be convenient for each thread to have its own copy of variables with global scope.

    • Outside parallel regions and in MASTER directives, accesses to these variables refer to the master thread’s copy.

    • #pragma omp threadprivate (list)

    • This directive must be at file or namespace scope, after all declarations of variables in list and before any references to variables in list or static variables.

    • Difference between PRIVATE: Reference to

PRIVATE
THREADPRIVATE

list variable

Any

Declared but not referenced or static

Local to

Region

Thread

Placed

Stack

Heap or TLS

Lifetime

Data scoping clause

Persist across region, recycle along with thread (maybe at the end of program)

Storage-associated

Each thread has a private copy.

Master thread use the original, other threads has a private copy.

  • Here is an example of THREADPRIVATE, which illustrates the lifetime of value and the storage association between master thread and program:

static int k;
#pragma omp parallel for num_threads(4) threadprivate(k)
for (int i = 0; i < 16; i++) {
    printf("This is thread %2d with k %2d entered i %2d\n", omp_get_thread_num(), k, i);
}
printf("Original k: %d\n", k);
/* output: 
  This is thread  1 with k  0 entered i  4
  This is thread  1 with k  4 entered i  5
  This is thread  1 with k  5 entered i  6
  This is thread  1 with k  6 entered i  7
  This is thread  2 with k  0 entered i  8
  This is thread  2 with k  8 entered i  9
  This is thread  2 with k  9 entered i 10
  This is thread  2 with k 10 entered i 11
  This is thread  0 with k  0 entered i  0
  This is thread  0 with k  0 entered i  1
  This is thread  0 with k  1 entered i  2
  This is thread  0 with k  2 entered i  3
  This is thread  3 with k  0 entered i 12
  This is thread  3 with k 12 entered i 13
  This is thread  3 with k 13 entered i 14
  This is thread  3 with k 14 entered i 15
  Original k: 3 # exact the same as thread 0
*/

COPYIN clause:

  • Allows the values of the master thread’s THREADPRIVATE data to be copied to all other threads at the start of a parallel region.

Timing routines:

  • Return current wall clock time (relative to arbitrary origin) with: double omp_get_wtime(void)

  • Return clock precision with: double omp_get_wtick(void)

  • Timers are local to a thread, so must make both calls on the same thread to get the duration.

  • NO guarantee about resolution!

Reference

StackOverflow
Difference between OpenMP threadprivate and private | StackOverflow