Future Topics in OpenMP

Nested parallelism:

  • enabled with OMP_NESTED environment variable or the omp_set_nested(1) routine.

  • If a PARALLEL directive is encountered within another PARALLEL directive, a new term of threads will be created.

  • The new team will contain only one thread unless nested parallelism is enabled.

NUM_THREADS clause:

  • One way to control the number of threads used at each level is with the num_threads(integer) clause:

    #pragma omp parallel for num_threads(4)
    for (int i = 0; i < N; i++) {
        #pragma omp parallel for num_threads(total_threads/4)
        for (int j = 0; j < M; j++) {
            // do computations
        }
    }
  • The value set in the clause supersedes the value in the environment variable OMP_NUM_THREADS (or that set by omp_set_num_threads(integer))

Orphaned directives:

  • Directives are active in the dynamic scope of a parallel region, not just its lexical scope. e.g.

    void foo() {
        // do computations
    }
    #pragma omp parallel
    {
        foo()
    }
  • Useful as it allows a modular programming style

  • But also can be confusing if the call tree is complicated.

  • Extra rules about data scope attributes:

    • Variables in the argument list inherit their data scope attribute from the calling routine.

    • Global variables in C++ and COMMON blocks or module variables in Fortran are shared, unless declared THREADPRIVATE.

    • static local variables in C/C++ and SAVE variables in Fortran are shared.

    • All other local variables are private.

  • Binding rules:

    • DO/FOR, SECTIONS, SINGLE, MASTER and BARRIER directives always bind to the nearest enclosing PARALLEL directive.

  • Thread private global variables:

    • It can be convenient for each thread to have its own copy of variables with global scope.

    • Outside parallel regions and in MASTER directives, accesses to these variables refer to the master thread’s copy.

    • #pragma omp threadprivate (list)

    • This directive must be at file or namespace scope, after all declarations of variables in list and before any references to variables in list or static variables.

    • Difference between PRIVATE: Reference to StackOverflow

  • Here is an example of THREADPRIVATE, which illustrates the lifetime of value and the storage association between master thread and program:

static int k;
#pragma omp parallel for num_threads(4) threadprivate(k)
for (int i = 0; i < 16; i++) {
    printf("This is thread %2d with k %2d entered i %2d\n", omp_get_thread_num(), k, i);
}
printf("Original k: %d\n", k);
/* output: 
  This is thread  1 with k  0 entered i  4
  This is thread  1 with k  4 entered i  5
  This is thread  1 with k  5 entered i  6
  This is thread  1 with k  6 entered i  7
  This is thread  2 with k  0 entered i  8
  This is thread  2 with k  8 entered i  9
  This is thread  2 with k  9 entered i 10
  This is thread  2 with k 10 entered i 11
  This is thread  0 with k  0 entered i  0
  This is thread  0 with k  0 entered i  1
  This is thread  0 with k  1 entered i  2
  This is thread  0 with k  2 entered i  3
  This is thread  3 with k  0 entered i 12
  This is thread  3 with k 12 entered i 13
  This is thread  3 with k 13 entered i 14
  This is thread  3 with k 14 entered i 15
  Original k: 3 # exact the same as thread 0
*/

COPYIN clause:

  • Allows the values of the master thread’s THREADPRIVATE data to be copied to all other threads at the start of a parallel region.

Timing routines:

  • Return current wall clock time (relative to arbitrary origin) with: double omp_get_wtime(void)

  • Return clock precision with: double omp_get_wtick(void)

  • Timers are local to a thread, so must make both calls on the same thread to get the duration.

  • NO guarantee about resolution!

Reference

Last updated