Sent Successfully.
Home / Blog / Data Science / NLP Tool Kit
NLP Tool Kit
Table of Content
Vivi: Hey, Navi
Yes, Vivi :Navi
Vivi: When did your son start speaking complete sentences?
When he was about 2 years old: Navi
Vivi: Oh, Is it? My daughter started speaking full
sentences when she was about 1 year old
Interesting! How did it happen, Vivi? :Navi
Vivi: Myself and my family members used to speak a lot
to our daughter. So she picked up the language faster.
From the exchange above, it is clear that a newborn picks up new words, word groups, and phrases more quickly when they are used often in speech. Similar to this, when we train a natural language processing system, the system learns a language more quickly and accurately the more new words, groups of words, and sentences we teach it.
For text analysis and natural language processing, Python has a module called NTLK.
What is Natural Language Processing?
A method called natural language processing trains a computer to comprehend written or spoken language. Humans talk with one another using a common language so that they may comprehend one another's viewpoints and provide the appropriate response. In an NLP system, a machine rather than a human makes the interaction, comprehension, and response.
Click here to explore 360DigiTMG.
Applications of NLP
- Information retrieval & Web Search
- Correction of grammatical errors
- Answering the queries
- Summarization of test
- Machine Translation
- Sentiment Analysis
Click here to learn Data Science in Hyderabad
How to install NLTK?
To install NLTK and use it in our Python programs, follow the below steps:
- Install using the command pip install nltk
- Import nltk
- To install packages use the download() method
Text Processing using NTLK
The first step in processing text using NLTK is Tokenization. Tokenizing is a process of breaking text into smaller parts i.e. paragraphs to sentences, sentences to words. There are two types of tokenizers.
- Sentence Tokenizer
- Word Tokenizer
Sentence Tokenizer
>>> sampletext= “Artificial Intelligence is sometimes called Machine Intelligence. It is intelligence demonstrated by machines”
>>> from nltk.tokenize import sent_tokenize
>>> sent_tokenize(sampletext)
Output: [‘Artificial Intelligence is sometimes called Machine Intelligence’, ‘It is intelligence demonstrated by machines’]
Word Tokenizer
>>> sampletext= “Artificial Intelligence is sometimes called Machine Intelligence”
>>> from nltk.tokenize import word_tokenize
>>> word_tokenize(sampletext)
Output: [‘Artificial’,’ Intelligence’,’ is’, ’sometimes’,’called’, ’Machine’,’ Intelligence’]
Click here to learn Data Science in Bangalore
Stemming and Lemmatization using nltk
What is Stemming?
Stemming is the process of bringing words into the norm. There will be one root word and several spellings of that term. The root word for play, for instance, has variants such as plays, playing, play-area, etc. We can identify the root word of any variants via stemming.
Learn the core concepts of Data Science Course video on Youtube:
The "PorterStemmer" algorithm is part of NLTK. This method finds the root word from the collection of tokenized words.
Example:
Output:
call
call
call
call
What is Lemmatization?
Lemmatization is the computational process of determining a word's lemma based on its meaning. The suffix is removed from the word during the stemming process. It removes either the word's beginning or finish. The process of lemmatization is seen as intelligent since the correct form may be determined by consulting a lexicon. Lemmatization hence helps to create better machine learning features.
Click here to learn Data Analytics in Hyderabad
Example to distinguish between Lemmatization and Stemming
Stemming Code
Output:
Stemming for tries is try
Stemming for cries is cry
Lemmatization code
Output:
Lemma for tries is try
Lemma for cries is cry
Click here to learn Artificial Intelligence in Bangalore
Find Synonyms From NLTK WordNet
WordNet is an NLP database with a collection of synonyms, antonyms, and brief definitions.
Example:
Antonyms from NLTK WordNet
Stop Words Removal
Stop words can be removed from the text before processing it. Stop words are to be removed from text data to remove noise from the data. It is one of the pre-processing steps in text processing.
Example:
Output: ['Find', 'frequency', 'word', 'text', 'file', '!']
Stop words like ‘of’, ‘from’, and ’a’ are removed from the text data.
Click here to learn Artificial Intelligence in Hyderabad, Machine Learning in Hyderabad, Machine Learning in Bangalore
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Data Science Placement Success Story
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Data Analyst Courses in Other Locations
ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka
Navigate to Address
360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia
Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia
+60 19-383 1378