티스토리 뷰
AI/NLP
Building an encoder-decoder transformer architecture for sequence-to-equence language tasks
미남잉 2024. 11. 5. 14:51728x90
Building an encoder-decoder transformer architecture for sequence-to-equence language tasks like text translation and summarization
- Encoder-Decoder Connection: The encoder connects to the decoder through cross-Attention, allowing the decoder to use the encoder's final hidden states to generate the target sequence.
- Cross-Attention Mechanism: This mechanism helps the decoder "look back" at the input sequence to generate the next word in the target sequence. For example, in translating "I really like to travel" to Spanish, "travel" receives the highest attention.
- Decoder Layer: The forward() method in the decoder layer requires two masks: the causal mask for the first attention stage and the cross-attention mask for the second stage.
- Training vs. Inference: During training, the decoder uses actual target sequences as inputs. During inference, it generates the target sequence starting with an empty output embedding.
# Create a batch of random input sequences
input_sequence = torch.randint(0, vocab_size, (batch_size, sequence_length))
padding_mask = torch.randint(0, 2, (sequence_length, sequence_length))
causal_mask = torch.triu(torch.ones(sequence_length, sequence_length), diagonal=1)
# Instantiate the two transformer bodies
encoder = TransformerEncoder(vocab_size, d_model, num_layers, num_heads, d_ff, dropout, max_sequence_length=sequence_length)
decoder = TransformerDecoder(vocab_size, d_model, num_layers, num_heads, d_ff, dropout, max_sequence_length=sequence_length)
# Pass the necessary masks as arguments to the encoder and the decoder
encoder_output = encoder(input_sequence, padding_mask)
decoder_output = decoder(input_sequence, causal_mask, encoder_output, padding_mask)
print("Batch's output shape: ", decoder_output.shape)
728x90
'AI > NLP' 카테고리의 다른 글
Transformer Inference의 두 단계: Prefill과 Decode 및 KV Caching 차이 (0) | 2024.11.04 |
---|---|
Prompt-based Learning이란? (0) | 2023.04.22 |
[자연어처리 수업 정리] Natural language processing - tutorial (2) | 2023.04.20 |
Embedding Layer의 이해 (0) | 2022.11.06 |
Lecture 12: Recurrent Networks (RNN, LSTM, GRU) (0) | 2022.08.25 |
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
- Total
- Today
- Yesterday
링크
TAG
- stylegan
- 파이썬
- 구글드라이브연동
- 구글드라이브다운
- support set
- 프롬프트
- Prompt
- style transfer
- CNN
- prompt learning
- 퓨샷러닝
- clip
- docker
- 파이썬 클래스 계층 구조
- 파이썬 클래스 다형성
- vscode 자동 저장
- 파이썬 딕셔너리
- 서버구글드라이브연동
- cs231n
- few-shot learning
- python
- NLP
- 구글드라이브서버연동
- 도커
- Unsupervised learning
- 구글드라이브서버다운
- 딥러닝
- 서버에다운
- 도커 컨테이너
- 데이터셋다운로드
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
글 보관함
250x250