🍪
cookielau
  • Introduction
  • Machine Learning
    • Distributed
      • Bookmarks
    • NLP
      • Transformers
    • MLC
      • Tensor Program Abstraction
      • End-to-End Module Execution
  • Framework
    • PyTorch
      • Bookmarks
      • Model
      • Shared
      • Miscellaneous
    • Tensorflow
      • Bookmarks
      • Model
      • Shared
      • Miscellaneous
    • CUDA
      • Bookmarks
    • DeepSpeed
    • Bagua
      • Model
      • Optimizer
    • Others
      • Bookmarks
  • About Me
    • 2022-04-28
  • Random Thoughts
  • Archives
    • CPP
      • Bookmarks
      • Container
      • Algorithm
      • FILE CONTROL
      • Virtual Table
      • Assembly
      • Key Words
      • Problems
      • Others
    • JAVA
      • String Container
      • Maps
    • PYTHON
      • Bookmarks
      • Python Tools
        • Batch Rename
        • Combine Excel
        • Excel Oprations
        • Read Write Excel
        • Rotate PDF
      • Library
        • Pandas Notes
        • Numpy Notes
        • Json Notes
      • Spider
        • Selenium Install
        • Selenium Locating
        • Selenium Errors
        • Selenium Basics
      • Django
        • Start Up
      • Others
    • LINUX
      • Installation
      • Cli Tools
      • WSL
      • Bugs
    • JUNIOR2
      • Economics
        • Chapter 0x01 经济管理概述
        • Chapter 0x02 微观市场机制分析
        • Chapter 0x03 生产决策与市场结构
        • Chapter 0x04 宏观经济市场分析
        • Chapter 0x05 管理的职能
        • Chapter 0x06 生产系统结构与战略
        • Chapter 0x0b 投资项目经济评价
        • Chapter 0x0f 投资项目经济评价
      • Computer Network
        • 概述
        • 分层模型
        • 物理层
        • 数据链路层
        • 网络层
        • 传输层
        • 应用层
        • HTTP(s)实验
        • [Practice]
      • Software Engineering
        • Introduction
        • Demand Analysis
        • Task Estimation
        • Presentation
      • Network Security
        • Chapter 0x01 概述
        • Chapter 0x02 密码学
        • Chapter 0x03 公钥体制
        • Chapter 0x04 消息认证
        • Chapter 0x05 密钥管理
        • Chapter 0x06 访问控制
        • Assignments
      • x86 Programming
        • Basic Knowledge
        • Program Design
        • System Interruption
        • Frequently used functions
    • MD&LaTex
      • Markdown
      • LaTex
    • NPM
      • NPM LINK
    • MyBlogs
      • 2020BUAA软工——“停下来,回头看”
      • 2020BUAA软工——“初窥构建之法”
      • 2020BUAA软工——“上手软件工程,PSP初体验!”
      • 2020BUAA软工——“深度评测官”
      • 2020BUAA软工——“并肩作战,平面交点Pro”
    • SC
      • PAC 2022
        • Lectures
      • OpenMP & MPI
        • MPI Overview
        • Message Passing Programming
        • OpenMP Overview
        • Work Sharing Directives
        • Annual Challenge
        • Future Topics in OpenMP
        • Tasks
        • OpenMP & MPI
    • Hardware
      • Nvidia GPU
        • Frequent Error
        • Memory Classification
        • CUDA_7_Streams_Simplify_Concurrency
        • Optimize_Data_Transfers_in_CUDA
        • Overlap_Data_Transfers_in_CUDA
        • Write_Flexible_Kernels_with_Grid-Stride_Loops
        • How_to_Access_Global_Memory_Efficiently
        • Using_Shared_Memory
      • Intel CPU
        • Construction
        • Optimization
        • Compilation
        • OpenMP
    • English
      • Vocab
      • Composition
    • Interview
      • Computer Network
Powered by GitBook
On this page
  • Shared Variables
  • Threads in HPC

Was this helpful?

  1. Archives
  2. SC
  3. OpenMP & MPI

OpenMP Overview

OpenMP is not magic!

Shared Variables

Parallel Programming using Threads


Most commonly, many applications has single thread in each process, btu a single process can contain multiple threads. Each thread is like a child process contained within parent process.

  • Threads can see all data in parent process.

  • Threads can run on different cores.

  • Threads have potential for parallel speedup.

Analogy:

  • Huge whiteboard: shared memory

  • Different people: threads

  • Do not write on the same place: not interfering with each other

  • Have a place inaccessible to the others: private data

Each thread has its own PC(Program Counter, which has the address of the next instruction to be executed from memory.) and private data, as well as shared data with all other threads.

Synchronisation:

  • Crucial for shared variables approach.

  • Most commonly use global barrier synchronisation (Coarse-grained), also can use lock (Fine-grained) and even CAS (Compare and Swap, atomic instruction guaranteed by hardware)

  • Writing parallel codes relatively straightforward, access shared data as and when its needed.

  • Getting correct code can be difficult.

Example:

  • Computing $asum = a_0+a_1+...+a_7$

    • Shared:

      • main array: a[8]

      • result: asum

    • private:

      • loop counter: i

      • loop limits: istart, istop

      • local sum: mysum

    • synchronisation:

      • thread0: asum += mysum

      • barrier

      • thread1: asum += mysum

Threads in HPC

  • Threads existed before parallel computers

    • designed for concurrency

    • many more threads running than physical cores

      • scheduled / de-scheduled when needed

      • schedule policy: FIFO, LRU, LFU, CLOCK

  • For parallel computing

    • typically run a single thread per core (For affinity and avoid resources overload)

    • want them all to run all the time (Avoid Context-Switch overload)

  • Os optimisations

    • place threads on selected cores (taskset in Linux, KMP_AFFINITY in OpenMP, numactl in NUMA architecture)

    • stop them from migrating (Mitigate Cache-miss and context switch)

PreviousMessage Passing ProgrammingNextWork Sharing Directives

Last updated 2 years ago

Was this helpful?