🍪
cookielau
  • Introduction
  • Machine Learning
    • Distributed
      • Bookmarks
    • NLP
      • Transformers
    • MLC
      • Tensor Program Abstraction
      • End-to-End Module Execution
  • Framework
    • PyTorch
      • Bookmarks
      • Model
      • Shared
      • Miscellaneous
    • Tensorflow
      • Bookmarks
      • Model
      • Shared
      • Miscellaneous
    • CUDA
      • Bookmarks
    • DeepSpeed
    • Bagua
      • Model
      • Optimizer
    • Others
      • Bookmarks
  • About Me
    • 2022-04-28
  • Random Thoughts
  • Archives
    • CPP
      • Bookmarks
      • Container
      • Algorithm
      • FILE CONTROL
      • Virtual Table
      • Assembly
      • Key Words
      • Problems
      • Others
    • JAVA
      • String Container
      • Maps
    • PYTHON
      • Bookmarks
      • Python Tools
        • Batch Rename
        • Combine Excel
        • Excel Oprations
        • Read Write Excel
        • Rotate PDF
      • Library
        • Pandas Notes
        • Numpy Notes
        • Json Notes
      • Spider
        • Selenium Install
        • Selenium Locating
        • Selenium Errors
        • Selenium Basics
      • Django
        • Start Up
      • Others
    • LINUX
      • Installation
      • Cli Tools
      • WSL
      • Bugs
    • JUNIOR2
      • Economics
        • Chapter 0x01 经济管理抂述
        • Chapter 0x02 埮观垂场机制分析
        • Chapter 0x03 生产决策䞎垂场结构
        • Chapter 0x04 宏观经济垂场分析
        • Chapter 0x05 管理的职胜
        • Chapter 0x06 生产系统结构䞎战略
        • Chapter 0x0b 投资项目经济评价
        • Chapter 0x0f 投资项目经济评价
      • Computer Network
        • 抂述
        • 分层暡型
        • 物理层
        • 数据铟路层
        • 眑络层
        • 䌠蟓层
        • 应甚层
        • HTTP(s)实验
        • [Practice]
      • Software Engineering
        • Introduction
        • Demand Analysis
        • Task Estimation
        • Presentation
      • Network Security
        • Chapter 0x01 抂述
        • Chapter 0x02 密码孊
        • Chapter 0x03 公钥䜓制
        • Chapter 0x04 消息讀证
        • Chapter 0x05 密钥管理
        • Chapter 0x06 访问控制
        • Assignments
      • x86 Programming
        • Basic Knowledge
        • Program Design
        • System Interruption
        • Frequently used functions
    • MD&LaTex
      • Markdown
      • LaTex
    • NPM
      • NPM LINK
    • MyBlogs
      • 2020BUAA蜯工——“停䞋来回倎看”
      • 2020BUAA蜯工——“初窥构建之法”
      • 2020BUAA蜯工——“䞊手蜯件工皋PSP初䜓验”
      • 2020BUAA蜯工——“深床评测官”
      • 2020BUAA蜯工——“并肩䜜战平面亀点Pro”
    • SC
      • PAC 2022
        • Lectures
      • OpenMP & MPI
        • MPI Overview
        • Message Passing Programming
        • OpenMP Overview
        • Work Sharing Directives
        • Annual Challenge
        • Future Topics in OpenMP
        • Tasks
        • OpenMP & MPI
    • Hardware
      • Nvidia GPU
        • Frequent Error
        • Memory Classification
        • CUDA_7_Streams_Simplify_Concurrency
        • Optimize_Data_Transfers_in_CUDA
        • Overlap_Data_Transfers_in_CUDA
        • Write_Flexible_Kernels_with_Grid-Stride_Loops
        • How_to_Access_Global_Memory_Efficiently
        • Using_Shared_Memory
      • Intel CPU
        • Construction
        • Optimization
        • Compilation
        • OpenMP
    • English
      • Vocab
      • Composition
    • Interview
      • Computer Network
Powered by GitBook
On this page
  • Functions
  • Comprehension
  • Loss
  • What it really does
  • NLL & CrossEntropy
  • Accuracy
  • Categorical Accuracy
  • Alignment
  • Model

Was this helpful?

  1. Framework
  2. PyTorch

Shared

Functions

Item
PyTorch
TensorFlow
Others

clip

one_hot

Linear

einsum

where

Comprehension

Softmax

Softmax is used for rescaling the input Tensor on a dim so that the elements of that dim will lie in the range of [0,1] and sum to 1. Softmax function is defined as: Softmax(xi)=exp⁡xi∑jexp⁡xjSoftmax(x_i)=\frac{\exp{x_i}}{\sum_j{\exp{x_j}}}Softmax(xi​)=∑j​expxj​expxi​​

Softmax is usually used in the category situation where we have n labels/tags on the last dim, therefore we use softmax to magnify the difference between them. For example:

class BertClassification(nn.Module):
  def __init__(self, config):
        super().__init__()

        self.config = config
        self.bert = BertModel(config)
        self.dense = torch.nn.Linear(config.hidden_size, config.num_labels) # FC layer
        self.pred = torch.nn.Softmax(dim=-1) # magnify by softmax

Loss

What it really does

when you do loss.backward(), it is a shortcut for loss.backward(torch.Tensor([1])). This in only valid if loss is a tensor containing a single element. DataParallel returns to you the partial loss that was computed on each gpu, so you usually want to do loss.backward(torch.Tensor([1, 1])) or loss.sum().backward(). Both will have the exact same behaviour.

NLL & CrossEntropy

sparse 代衚 targets 是 数字猖码而䞍加 sparse 则是 one_hot 猖码

  tf.nn.sparse_softmax_cross_entropy_with_logits(prediction, ground_truth)
= tf.keras.losses.sparse_categorical_crossentropy(prediction, ground_truth, from_logits=True)
= nn.NLLLoss(reduction="none")(nn.LogSoftmax(dim=-1)(prediction), ground_truth)
= torch.nn.CrossEntropyLoss(reduce=False)( prediction, ground_truth)

keras版本默讀from_logits=False 泚意这里的from_logits=False只是衚瀺经过了䞀层softmax绎床䞺2则取axis=1绎床䞺3则取axis=2log仍然是圚sparse_categorical_crossentropy凜数里面

  epsilon = 1e-7

  tf.keras.losses.sparse_categorical_crossentropy(prediction, ground_truth, from_logits=False)
= torch.nn.CrossEntropyLoss(reduce=False)(torch.log(torch.clamp(torch.tensor(y_pred), epsilon, 1-epsilon)), torch.tensor(y_true))

由于 CrossEntropy 的计算写死的按照 dim=1 进行分类因歀圚计算 Batch Loss 的时候需芁讲 dim=1 讟眮䞺 categories:

predictions # (batch_size, feature, category)
targets # (batch_size, feature_category)
loss_fn = torch.nn.CrossEntropyLoss(reduction='None')
loss = loss_fn(predictions.permute(0,2,1), targets).mean(dim=1)

Accuracy

Categorical Accuracy

The average score of true positives of entire dataset:

sum(torch.eq(torch.argmax(prediction, dim=-1), labels).view(-1)) / labels.view(-1).size()[0]

Alignment

  • import numpy as np
    from sklearn.metrics.pairwise import cosine_similarity, paired_distances
    
    x = np.array([[0.26304135, 0.91725843, 0.61099966, 0.40816231, 0.93606288, 0.52462691]])
    print(x)
    y = np.array([[0.03756129, 0.50223667, 0.66529424, 0.57392135, 0.20479857, 0.27286363]])
    print(y)
    # 䜙匊盞䌌床
    simi = cosine_similarity(x, y)
    print('cosine similarity:', simi)
    # 䜙匊距犻 = 1 - 䜙匊盞䌌床
    dist = paired_distances(x, y, metric='cosine')
    print('cosine distance:', dist)

Model

  1. create a fake pytorch model, whose data will be filled/replaced by tf checkpoints;

  2. collect all save parameters name and values;

  3. use getattr to index into the submodule of pytorch model;

  4. fill the data by pointer.data = torch.from_numpy(array).

PreviousModelNextMiscellaneous

Last updated 3 years ago

Was this helpful?

Quote from :

Ref:

从可以看出圓 from_logits==False 的时候䌚经过䞀层 tf.math.log 所以加䞊之后就可以对霐了

Ref:

Calculate similarity using cosine distance

Reference:

The main function used is , the steps are as follows:

albanD
Loss.backward() raises error ‘grad can be implicitly created only for scalar outputs’
源码
torch.nn.CrossEntropyLoss over Multiple Batches
python批量计算cosine distance
convert_bert_original_tf_checkpoint_to_pytorch
load_tf_weights_in_bert
torch.clamp
tf.clip_by_value
torch.nn.functional.one_hot
tf.one_hot
Linear
Linear
Dense
einsum
einsum
Introduction
Usage
where
where
Cosine Similarity