Home / Blog / Data Science / NLP Tool Kit

NLP Tool Kit

July 06, 2023
20

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Applications of NLP

Information retrieval & Web Search
Correction of grammatical errors
Answering the queries
Summarization of test
Machine Translation
Sentiment Analysis

Click here to learn Data Science in Hyderabad

How to install NLTK?

To install NLTK and use it in our Python programs, follow the below steps:

Install using the command pip install nltk
Import nltk
To install packages use the download() method

Text Processing using NTLK

The first step in processing text using NLTK is Tokenization. Tokenizing is a process of breaking text into smaller parts i.e. paragraphs to sentences, sentences to words. There are two types of tokenizers.

Sentence Tokenizer
Word Tokenizer

Sentence Tokenizer

>>> sampletext= “Artificial Intelligence is sometimes called Machine Intelligence. It is intelligence demonstrated by machines”
>>> from nltk.tokenize import sent_tokenize
>>> sent_tokenize(sampletext)

Output: [‘Artificial Intelligence is sometimes called Machine Intelligence’, ‘It is intelligence demonstrated by machines’]

Word Tokenizer

>>> sampletext= “Artificial Intelligence is sometimes called Machine Intelligence”
>>> from nltk.tokenize import word_tokenize
>>> word_tokenize(sampletext)

Output: [‘Artificial’,’ Intelligence’,’ is’, ’sometimes’,’called’, ’Machine’,’ Intelligence’]

Click here to learn Data Science in Bangalore

Stemming and Lemmatization using nltk

What is Stemming?

Stemming is the process of bringing words into the norm. There will be one root word and several spellings of that term. The root word for play, for instance, has variants such as plays, playing, play-area, etc. We can identify the root word of any variants via stemming.

Learn the core concepts of Data Science Course video on Youtube:

The "PorterStemmer" algorithm is part of NLTK. This method finds the root word from the collection of tokenized words.

Example:

what is stemming

Output:
call
call
call
call

What is Lemmatization?

Lemmatization is the computational process of determining a word's lemma based on its meaning. The suffix is removed from the word during the stemming process. It removes either the word's beginning or finish. The process of lemmatization is seen as intelligent since the correct form may be determined by consulting a lexicon. Lemmatization hence helps to create better machine learning features.

Click here to learn Data Analytics in Hyderabad

Example to distinguish between Lemmatization and Stemming

Stemming Code

stemming code

Output:

Stemming for tries is try
Stemming for cries is cry

Lemmatization code

lemmatization

Output:

Lemma for tries is try
Lemma for cries is cry

Click here to learn Artificial Intelligence in Bangalore

Watch Free Videos on Youtube

Find Synonyms From NLTK WordNet

WordNet is an NLP database with a collection of synonyms, antonyms, and brief definitions.

Example:

what is stemming

Antonyms from NLTK WordNet

Stop Words Removal

Stop words can be removed from the text before processing it. Stop words are to be removed from text data to remove noise from the data. It is one of the pre-processing steps in text processing.