# Shared

## Functions

|   Item   |                                                   PyTorch                                                   |                                     TensorFlow                                    |                                                     Others                                                     |
| :------: | :---------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: |
|   clip   |                   [`torch.clamp`](https://pytorch.org/docs/stable/torch.html#torch.clamp)                   | [`tf.clip_by_value`](https://www.tensorflow.org/api_docs/python/tf/clip_by_value) |                                                                                                                |
| one\_hot | [`torch.nn.functional.one_hot`](https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html) |       [`tf.one_hot`](https://www.tensorflow.org/api_docs/python/tf/one_hot)       |                                                                                                                |
|  Linear  |                  [`Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)                 |  [`Linear`](https://www.tensorflow.org/lattice/api_docs/python/tfl/layers/Linear) |                   [`Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)                  |
|  einsum  |                   [`einsum`](https://pytorch.org/docs/stable/generated/torch.einsum.html)                   |          [`einsum`](https://www.tensorflow.org/api_docs/python/tf/einsum)         | [Introduction](https://www.youtube.com/watch?v=pkVwUVEHmfI) [Usage](https://theaisummer.com/einsum-attention/) |
|   where  |                    [`where`](https://pytorch.org/docs/stable/generated/torch.where.html)                    |           [`where`](https://www.tensorflow.org/api_docs/python/tf/where)          |                                                                                                                |

### Comprehension

#### Softmax

Softmax is used for rescaling the input Tensor on a dim so that the elements of that dim will lie in the range of \[0,1] and sum to 1. Softmax function is defined as: $$Softmax(x\_i)=\frac{\exp{x\_i}}{\sum\_j{\exp{x\_j}}}$$

Softmax is usually used in the category situation where we have n labels/tags on the last dim, therefore we use softmax to magnify the difference between them. For example:

```python
class BertClassification(nn.Module):
  def __init__(self, config):
        super().__init__()

        self.config = config
        self.bert = BertModel(config)
        self.dense = torch.nn.Linear(config.hidden_size, config.num_labels) # FC layer
        self.pred = torch.nn.Softmax(dim=-1) # magnify by softmax
```

## Loss

### What it really does

Quote from [albanD](https://discuss.pytorch.org/u/albanD):

> when you do loss.backward(), it is a shortcut for **`loss.backward(torch.Tensor([1]))`**. This in only valid if loss is a tensor containing a single element. DataParallel returns to you the partial loss that was computed on each gpu, so you usually want to do loss.backward(torch.Tensor(\[1, 1])) or loss.sum().backward(). Both will have the exact same behaviour.

Ref: [Loss.backward() raises error ‘grad can be implicitly created only for scalar outputs’](https://discuss.pytorch.org/t/loss-backward-raises-error-grad-can-be-implicitly-created-only-for-scalar-outputs/12152)

### NLL & CrossEntropy

`sparse` 代表 `targets` 是 数字编码，而不加 `sparse` 则是 `one_hot` 编码

```python
  tf.nn.sparse_softmax_cross_entropy_with_logits(prediction, ground_truth)
= tf.keras.losses.sparse_categorical_crossentropy(prediction, ground_truth, from_logits=True)
= nn.NLLLoss(reduction="none")(nn.LogSoftmax(dim=-1)(prediction), ground_truth)
= torch.nn.CrossEntropyLoss(reduce=False)( prediction, ground_truth)
```

keras版本默认from\_logits=False 注意这里的from\_logits=False只是表示经过了一层softmax，维度为2则取axis=1，维度为3则取axis=2，log仍然是在sparse\_categorical\_crossentropy函数里面

```python
  epsilon = 1e-7

  tf.keras.losses.sparse_categorical_crossentropy(prediction, ground_truth, from_logits=False)
= torch.nn.CrossEntropyLoss(reduce=False)(torch.log(torch.clamp(torch.tensor(y_pred), epsilon, 1-epsilon)), torch.tensor(y_true))
```

从[源码](https://github.com/keras-team/keras/blob/d8fcb9d4d4dad45080ecfdd575483653028f8eda/keras/backend.py#L5167)可以看出当 `from_logits==False` 的时候，会经过一层 `tf.math.log` 所以加上之后就可以对齐了

由于 CrossEntropy 的计算写死的按照 `dim=1` 进行分类，因此在计算 Batch Loss 的时候需要讲 `dim=1` 设置为 categories:

```python
predictions # (batch_size, feature, category)
targets # (batch_size, feature_category)
loss_fn = torch.nn.CrossEntropyLoss(reduction='None')
loss = loss_fn(predictions.permute(0,2,1), targets).mean(dim=1)
```

Ref: [torch.nn.CrossEntropyLoss over Multiple Batches](https://stackoverflow.com/questions/70483124/torch-nn-crossentropyloss-over-multiple-batches)

## Accuracy

### Categorical Accuracy

The average score of true positives of entire dataset:

```python
sum(torch.eq(torch.argmax(prediction, dim=-1), labels).view(-1)) / labels.view(-1).size()[0]
```

## Alignment

* Calculate similarity using cosine distance ![Cosine Similarity](https://github.com/CookieLau/my_gitbook/blob/master/01-Framework/01-Tensorflow/assets/cosine.png)

  ```python
  import numpy as np
  from sklearn.metrics.pairwise import cosine_similarity, paired_distances

  x = np.array([[0.26304135, 0.91725843, 0.61099966, 0.40816231, 0.93606288, 0.52462691]])
  print(x)
  y = np.array([[0.03756129, 0.50223667, 0.66529424, 0.57392135, 0.20479857, 0.27286363]])
  print(y)
  # 余弦相似度
  simi = cosine_similarity(x, y)
  print('cosine similarity:', simi)
  # 余弦距离 = 1 - 余弦相似度
  dist = paired_distances(x, y, metric='cosine')
  print('cosine distance:', dist)
  ```

  Reference: [python批量计算cosine distance](https://blog.csdn.net/tszupup/article/details/107942261)

## Model

[convert\_bert\_original\_tf\_checkpoint\_to\_pytorch](https://huggingface.co/docs/transformers/converting_tensorflow_models)

The main function used is [`load_tf_weights_in_bert`](https://github.com/huggingface/transformers/blob/fa322474060beb3673cf5a3e39ccd3c8ad57ecd3/src/transformers/models/bert/modeling_bert.py#L109), the steps are as follows:

1. create a fake pytorch model, whose data will be filled/replaced by tf checkpoints;
2. collect all save parameters name and values;
3. use `getattr` to index into the submodule of pytorch model;
4. fill the data by `pointer.data = torch.from_numpy(array)`.
