# Shared

## Functions

|   Item   |                                                   PyTorch                                                   |                                     TensorFlow                                    |                                                     Others                                                     |
| :------: | :---------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------: |
|   clip   |                   [`torch.clamp`](https://pytorch.org/docs/stable/torch.html#torch.clamp)                   | [`tf.clip_by_value`](https://www.tensorflow.org/api_docs/python/tf/clip_by_value) |                                                                                                                |
| one\_hot | [`torch.nn.functional.one_hot`](https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html) |       [`tf.one_hot`](https://www.tensorflow.org/api_docs/python/tf/one_hot)       |                                                                                                                |
|  Linear  |                  [`Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)                 |  [`Linear`](https://www.tensorflow.org/lattice/api_docs/python/tfl/layers/Linear) |                   [`Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)                  |
|  einsum  |                   [`einsum`](https://pytorch.org/docs/stable/generated/torch.einsum.html)                   |          [`einsum`](https://www.tensorflow.org/api_docs/python/tf/einsum)         | [Introduction](https://www.youtube.com/watch?v=pkVwUVEHmfI) [Usage](https://theaisummer.com/einsum-attention/) |
|   where  |                    [`where`](https://pytorch.org/docs/stable/generated/torch.where.html)                    |           [`where`](https://www.tensorflow.org/api_docs/python/tf/where)          |                                                                                                                |

### Comprehension

#### Softmax

Softmax is used for rescaling the input Tensor on a dim so that the elements of that dim will lie in the range of \[0,1] and sum to 1. Softmax function is defined as: $$Softmax(x\_i)=\frac{\exp{x\_i}}{\sum\_j{\exp{x\_j}}}$$

Softmax is usually used in the category situation where we have n labels/tags on the last dim, therefore we use softmax to magnify the difference between them. For example:

```python
class BertClassification(nn.Module):
  def __init__(self, config):
        super().__init__()

        self.config = config
        self.bert = BertModel(config)
        self.dense = torch.nn.Linear(config.hidden_size, config.num_labels) # FC layer
        self.pred = torch.nn.Softmax(dim=-1) # magnify by softmax
```

## Loss

### What it really does

Quote from [albanD](https://discuss.pytorch.org/u/albanD):

> when you do loss.backward(), it is a shortcut for **`loss.backward(torch.Tensor([1]))`**. This in only valid if loss is a tensor containing a single element. DataParallel returns to you the partial loss that was computed on each gpu, so you usually want to do loss.backward(torch.Tensor(\[1, 1])) or loss.sum().backward(). Both will have the exact same behaviour.

Ref: [Loss.backward() raises error ‘grad can be implicitly created only for scalar outputs’](https://discuss.pytorch.org/t/loss-backward-raises-error-grad-can-be-implicitly-created-only-for-scalar-outputs/12152)

### NLL & CrossEntropy

`sparse` 代表 `targets` 是 数字编码，而不加 `sparse` 则是 `one_hot` 编码

```python
  tf.nn.sparse_softmax_cross_entropy_with_logits(prediction, ground_truth)
= tf.keras.losses.sparse_categorical_crossentropy(prediction, ground_truth, from_logits=True)
= nn.NLLLoss(reduction="none")(nn.LogSoftmax(dim=-1)(prediction), ground_truth)
= torch.nn.CrossEntropyLoss(reduce=False)( prediction, ground_truth)
```

keras版本默认from\_logits=False 注意这里的from\_logits=False只是表示经过了一层softmax，维度为2则取axis=1，维度为3则取axis=2，log仍然是在sparse\_categorical\_crossentropy函数里面

```python
  epsilon = 1e-7

  tf.keras.losses.sparse_categorical_crossentropy(prediction, ground_truth, from_logits=False)
= torch.nn.CrossEntropyLoss(reduce=False)(torch.log(torch.clamp(torch.tensor(y_pred), epsilon, 1-epsilon)), torch.tensor(y_true))
```

从[源码](https://github.com/keras-team/keras/blob/d8fcb9d4d4dad45080ecfdd575483653028f8eda/keras/backend.py#L5167)可以看出当 `from_logits==False` 的时候，会经过一层 `tf.math.log` 所以加上之后就可以对齐了

由于 CrossEntropy 的计算写死的按照 `dim=1` 进行分类，因此在计算 Batch Loss 的时候需要讲 `dim=1` 设置为 categories:

```python
predictions # (batch_size, feature, category)
targets # (batch_size, feature_category)
loss_fn = torch.nn.CrossEntropyLoss(reduction='None')
loss = loss_fn(predictions.permute(0,2,1), targets).mean(dim=1)
```

Ref: [torch.nn.CrossEntropyLoss over Multiple Batches](https://stackoverflow.com/questions/70483124/torch-nn-crossentropyloss-over-multiple-batches)

## Accuracy

### Categorical Accuracy

The average score of true positives of entire dataset:

```python
sum(torch.eq(torch.argmax(prediction, dim=-1), labels).view(-1)) / labels.view(-1).size()[0]
```

## Alignment

* Calculate similarity using cosine distance ![Cosine Similarity](/files/SoHpUDdtbdeQMSFBmA74)

  ```python
  import numpy as np
  from sklearn.metrics.pairwise import cosine_similarity, paired_distances

  x = np.array([[0.26304135, 0.91725843, 0.61099966, 0.40816231, 0.93606288, 0.52462691]])
  print(x)
  y = np.array([[0.03756129, 0.50223667, 0.66529424, 0.57392135, 0.20479857, 0.27286363]])
  print(y)
  # 余弦相似度
  simi = cosine_similarity(x, y)
  print('cosine similarity:', simi)
  # 余弦距离 = 1 - 余弦相似度
  dist = paired_distances(x, y, metric='cosine')
  print('cosine distance:', dist)
  ```

  Reference: [python批量计算cosine distance](https://blog.csdn.net/tszupup/article/details/107942261)

## Model

[convert\_bert\_original\_tf\_checkpoint\_to\_pytorch](https://huggingface.co/docs/transformers/converting_tensorflow_models)

The main function used is [`load_tf_weights_in_bert`](https://github.com/huggingface/transformers/blob/fa322474060beb3673cf5a3e39ccd3c8ad57ecd3/src/transformers/models/bert/modeling_bert.py#L109), the steps are as follows:

1. create a fake pytorch model, whose data will be filled/replaced by tf checkpoints;
2. collect all save parameters name and values;
3. use `getattr` to index into the submodule of pytorch model;
4. fill the data by `pointer.data = torch.from_numpy(array)`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://legacy.cookielau.com/framework/pytorch/98-shared.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
