Building an encoder-decoder transformer architecture for sequence-to-equence language tasks

티스토리 뷰

AI/NLP

Building an encoder-decoder transformer architecture for sequence-to-equence language tasks

Suyeon Cha 2024. 11. 5. 14:51

728x90

Building an encoder-decoder transformer architecture for sequence-to-equence language tasks like text translation and summarization

Encoder-Decoder Connection: The encoder connects to the decoder through cross-Attention, allowing the decoder to use the encoder's final hidden states to generate the target sequence.
Cross-Attention Mechanism: This mechanism helps the decoder "look back" at the input sequence to generate the next word in the target sequence. For example, in translating "I really like to travel" to Spanish, "travel" receives the highest attention.
Decoder Layer: The forward() method in the decoder layer requires two masks: the causal mask for the first attention stage and the cross-attention mask for the second stage.
Training vs. Inference: During training, the decoder uses actual target sequences as inputs. During inference, it generates the target sequence starting with an empty output embedding.

# Create a batch of random input sequences
input_sequence = torch.randint(0, vocab_size, (batch_size, sequence_length))
padding_mask = torch.randint(0, 2, (sequence_length, sequence_length))
causal_mask = torch.triu(torch.ones(sequence_length, sequence_length), diagonal=1)

# Instantiate the two transformer bodies
encoder = TransformerEncoder(vocab_size, d_model, num_layers, num_heads, d_ff, dropout, max_sequence_length=sequence_length)
decoder = TransformerDecoder(vocab_size, d_model, num_layers, num_heads, d_ff, dropout, max_sequence_length=sequence_length)

# Pass the necessary masks as arguments to the encoder and the decoder
encoder_output = encoder(input_sequence, padding_mask)
decoder_output = decoder(input_sequence, causal_mask, encoder_output, padding_mask)
print("Batch's output shape: ", decoder_output.shape)

728x90

저작자표시 비영리 변경금지 (새창열림)

'AI > NLP' 카테고리의 다른 글

Transformer Inference의 두 단계: Prefill과 Decode 및 KV Caching 차이 (0)	2024.11.04
Prompt-based Learning이란? (0)	2023.04.22
[자연어처리 수업 정리] Natural language processing - tutorial (2)	2023.04.20
Embedding Layer의 이해 (0)	2022.11.06
Lecture 12: Recurrent Networks (RNN, LSTM, GRU) (0)	2022.08.25

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

글 보관함

250x250

Deep-Dive AI

티스토리 뷰

Building an encoder-decoder transformer architecture for sequence-to-equence language tasks

'AI > NLP' 카테고리의 다른 글

티스토리툴바