Home / Blog / Artificial Intelligence / Transformers – The New Breed of NLP

Transformers – The New Breed of NLP

July 12, 2023
44

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Learn the core concepts of Data Science Course video on YouTube:

The Transformer Library contains many deep learning architectures and pre-trained models allowing us to solve text-related tasks with great ease such as language translation, question answering, text summarization and many sequences to sequence tasks in 100+ languages, these pre-trained models are capable of managing long-range dependencies effectively. Sequence to sequence task can also be defined as a task where input is in sequence order and output is another sequence order that may or may not be the same length as the input. They are based on transfer learning and are pre-trained heavily on data-rich tasks like language modeling. The models can be fine-tuned to perform specific tasks related to our specific datasets. Supported by popular deep learning libraries, PyTorch and TensorFlow, Transformers provide APIs that enable seamless integration between libraries and sharing with the community easy for further research experiments. Click here to learn Artificial Intelligence in Hyderabad

A Transformer has an encoder-decoder architecture in which an encoder turns the input text into numerical tensors and a decoder turns the tensors back into text. To complete an underlying job, it creates or extracts meaningful text data from the input representation. The 'Attention' model depicts the interdependencies between the different input and output components. Text input is divided into many tokens by tokenization as part of an encapsulation process. Each token is converted into a useful representation before being delivered to a decoder, where the representation is retrieved or produced as the output.The encoder-decoder architecture of a Transformer is where the encoder converts the input text into numeric tensors and a decoder converts the tensors into output text. It generates or extracts meaningful text data from the input representation to solve an underlying task. The ‘Attention’ models the dependencies between the various parts of the input and parts of the output. The input goes through an encapsulation pipeline involving tokenization, where the text input is split into multiple tokens. Each token is mapped into a meaningful representation and passed on to a Decoder where the representation is extracted or generated into a final output. These pre-trained models can be applied to solve a variety of tasks such as:

Feature Extraction: Where a model returns a vector representation of the text.
Summarization: An enormous length of text data is summarized in brief.
Language Translation: Text is translated into many other languages. Question
Answering: Extract answers from input context for input questions.
Sentiment Analysis: Good or bad reviews, positive or negative response.
Text Generation: An intelligence to generate text of meaning and value.

The illustration below provides a high-level view of the Transformer’s architecture and its building blocks. This article briefly introduces key elements of the architecture to understand what makes Transformer network a revolutionary breakthrough in Natural Language processing.