Shared
Functions
Comprehension
Softmax
Softmax is used for rescaling the input Tensor on a dim so that the elements of that dim will lie in the range of [0,1] and sum to 1. Softmax function is defined as:
Softmax is usually used in the category situation where we have n labels/tags on the last dim, therefore we use softmax to magnify the difference between them. For example:
Loss
What it really does
Quote from albanD:
when you do loss.backward(), it is a shortcut for
loss.backward(torch.Tensor([1]))
. This in only valid if loss is a tensor containing a single element. DataParallel returns to you the partial loss that was computed on each gpu, so you usually want to do loss.backward(torch.Tensor([1, 1])) or loss.sum().backward(). Both will have the exact same behaviour.
Ref: Loss.backward() raises error ‘grad can be implicitly created only for scalar outputs’
NLL & CrossEntropy
sparse
代表 targets
是 数字编码,而不加 sparse
则是 one_hot
编码
keras版本默认from_logits=False 注意这里的from_logits=False只是表示经过了一层softmax,维度为2则取axis=1,维度为3则取axis=2,log仍然是在sparse_categorical_crossentropy函数里面
从源码可以看出当 from_logits==False
的时候,会经过一层 tf.math.log
所以加上之后就可以对齐了
由于 CrossEntropy 的计算写死的按照 dim=1
进行分类,因此在计算 Batch Loss 的时候需要讲 dim=1
设置为 categories:
Ref: torch.nn.CrossEntropyLoss over Multiple Batches
Accuracy
Categorical Accuracy
The average score of true positives of entire dataset:
Alignment
Model
convert_bert_original_tf_checkpoint_to_pytorch
The main function used is load_tf_weights_in_bert
, the steps are as follows:
create a fake pytorch model, whose data will be filled/replaced by tf checkpoints;
collect all save parameters name and values;
use
getattr
to index into the submodule of pytorch model;fill the data by
pointer.data = torch.from_numpy(array)
.
Last updated