티스토리 뷰
AI/NLP
Building an encoder-decoder transformer architecture for sequence-to-equence language tasks
Suyeon Cha 2024. 11. 5. 14:51728x90
Building an encoder-decoder transformer architecture for sequence-to-equence language tasks like text translation and summarization
- Encoder-Decoder Connection: The encoder connects to the decoder through cross-Attention, allowing the decoder to use the encoder's final hidden states to generate the target sequence.
- Cross-Attention Mechanism: This mechanism helps the decoder "look back" at the input sequence to generate the next word in the target sequence. For example, in translating "I really like to travel" to Spanish, "travel" receives the highest attention.
- Decoder Layer: The forward() method in the decoder layer requires two masks: the causal mask for the first attention stage and the cross-attention mask for the second stage.
- Training vs. Inference: During training, the decoder uses actual target sequences as inputs. During inference, it generates the target sequence starting with an empty output embedding.
# Create a batch of random input sequences
input_sequence = torch.randint(0, vocab_size, (batch_size, sequence_length))
padding_mask = torch.randint(0, 2, (sequence_length, sequence_length))
causal_mask = torch.triu(torch.ones(sequence_length, sequence_length), diagonal=1)
# Instantiate the two transformer bodies
encoder = TransformerEncoder(vocab_size, d_model, num_layers, num_heads, d_ff, dropout, max_sequence_length=sequence_length)
decoder = TransformerDecoder(vocab_size, d_model, num_layers, num_heads, d_ff, dropout, max_sequence_length=sequence_length)
# Pass the necessary masks as arguments to the encoder and the decoder
encoder_output = encoder(input_sequence, padding_mask)
decoder_output = decoder(input_sequence, causal_mask, encoder_output, padding_mask)
print("Batch's output shape: ", decoder_output.shape)
728x90
'AI > NLP' 카테고리의 다른 글
Transformer Inference의 두 단계: Prefill과 Decode 및 KV Caching 차이 (0) | 2024.11.04 |
---|---|
Prompt-based Learning이란? (0) | 2023.04.22 |
[자연어처리 수업 정리] Natural language processing - tutorial (2) | 2023.04.20 |
Embedding Layer의 이해 (0) | 2022.11.06 |
Lecture 12: Recurrent Networks (RNN, LSTM, GRU) (0) | 2022.08.25 |
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
- Total
- Today
- Yesterday
링크
TAG
- few-shot learning
- Prompt
- 리눅스 나노
- 파이썬 딕셔너리
- 파이썬
- 파이썬 클래스 다형성
- NLP
- stylegan
- python
- 도커
- 리눅스 nano
- 딥러닝
- style transfer
- 구글드라이브연동
- clip
- 리눅스
- 퓨샷러닝
- cs231n
- linux nano
- 서버구글드라이브연동
- 프롬프트
- prompt learning
- CNN
- 도커 작업
- 파이썬 클래스 계층 구조
- support set
- docker
- 리눅스 나노 사용
- Unsupervised learning
- 도커 컨테이너
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
글 보관함
250x250