Sent Successfully.
Home / Blog / Interview Questions on Data Science / Text Mining Interview Questions and Answers
Text Mining Interview Questions and Answers
Table of Content
- Text Mining is performed on which kind of data?
- Which of the following packages is where unstructured data cannot be useful?
- Which of the following is a true statement for pre-processing topics in untrusted data?
- Which of the following popular Open-source libraries for NLP?
- Which Step-by-step instruction is used to discover record closeness in NLP?
- To normalize keywords in NLP, which technique do we follow?
- Which one of the following is a perfect statement for Term Frequency (TF)?
- What will TF-IDF do?
- What is the output of the line of code shown below?
- What are the common NLP techniques?
- Which one of coming up next is certifiably not a pre-handling method in NLP?
- Removing words like “and”, “is”, “a”, “an”, “the” from a sentence is called as?
- To identify location, people, and an organization from a given sentence is called?
- Which of the accompanying territories is where NLP can be valuable?
- The process of deriving high quality information from text is referred to as ________.
- The various aspects of text mining is/are____________.
- ________is fundamentally defining unstructured data to structured data and applying text.
- In a structured and annotated text dataset you can just import into your program, to apply text mining operation is statistically referred as _______.
- Bag of words referred to as ________ .
- With text mining we are able to perform _________ tasks.
- With text mining we are able to perform ________ tasks.
- Text mining is _________ method.
- Select correct sequence of text mining process from below-
- In words approach (BOW) approach, we look at the __________ of the words within the text, i.e. considering each word count as a feature.
- The matrix (t X d) where t is the no. of terms and d is the no. of documents and which measures Frequencies of selected important words and/or phrases occurring in each document is called as ________ .
- Machine learning algorithms cannot work with raw text directly; the text must be converted into numbers. Specifically, vectors of numbers. This is called _________.
- For a very large corpus, that the length of the vector might be thousands or millions of positions and each document may contain very few of the known words in the vocabulary then this results in a vector with lots of zero scores called as________.
- The approach is to create a vocabulary of grouped word in order to the scope of the vocabulary and allows the bag-of-words to capture a little bit more meaning from the document then each word or token is called as __________ .
- Creating a vocabulary of two-word pairs is, in turn, called a _________ model.
-
Text Mining is performed on which kind of data?
- a) Label data.
- b) Unsaturated data.
- c) Continues data.
- d) Discrete data.
Answer - b) Unsaturated data
-
Which of the following packages is where unstructured data cannot be useful?
- a) Nltk.
- b) Requests, re, magrittr.
- c) From sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer.
- d) From sklearn.naive_Bayes import MultinominalNB as MB.
Answer - d) From sklearn.naive_Bayes import MultinominalNB as MB
-
Which of the following is a true statement for pre-processing topics in untrusted data?
Statement 1: It is one of the most common stemming algorithms which are basically designed to remove and replace well-known suffixes of English words.
Statement 2: Lemmatization technique is like Stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.- a) Statement 1&3 is true and statement 2 is false.
- b) Statement 2&3 is False and statement 1 is true.
- c) All the above statements are true.
- d) None of the above.
Answer - c) All the above statements are true
-
Which of the following popular Open-source libraries for NLP?
- a) NLTK , CoreNLP.
- b) SciKit Learn, Textblob.
- c) SpaCY, Gensim.
- d) All the above.
Answer - d) All the above
-
Which Step-by-step instruction is used to discover record closeness in NLP?
- a) Lemmatization.
- b) Euclidean distance.
- c) Cosine Similarity.
- d) N-gram.
Answer - b) Euclidean distance,c)Cosine Similarity
-
To normalize keywords in NLP, which technique do we follow?
- a) Lemmatization.
- b) Parts of speech.
- c) TF-IDF.
- d) N-Gram.
Answer - a) Lemmatization
-
Which one of the following is a perfect statement for Term Frequency (TF)?
- a) % of words taking each document is called ___.
- b) Talking about how popular feature across all the reviews.
- c) 3To remove the effect of outliner concepts is called ____.
- d) None of the Above.
Answer - a) % of words taking each document is called ___.
-
What will TF-IDF do?
- a) Most important word in the document.
- b) To remove the effect of outliner concepts is called.
- c) 3Measurements of how well probability distribution.
- d) Most frequently occurring word in the document.
Answer - a) Most important word in the document , b) To remove the effect of outliner concepts is called
-
What is the output of the line of code shown below?
import nltk
from nltk.stem import PorterStemmer
word_stemmer = PorterStemmer()
word_stemmer.stem(' easily', 'runner', 'running')- a) Easili,runner, run.
- b) Easil, run, runn.
- c) Easy, runne, running.
- D) None of the above.
Answer - a) Easili,runner, run
-
What are the common NLP techniques?
- a) Named Entity Recognition.
- b) Sentiment Analysis.
- c) Text Modeling.
- d) All the above.
Answer - d) All the above
-
Which one of coming up next is certifiably not a pre-handling method in NLP?
- a) Stemming and Lemmatization.
- b) 2Converting to lowercase.
- c) Removing punctuations.
- d) Text summarization.
Answer - d) Text summarization
-
Removing words like “and”, “is”, “a”, “an”, “the” from a sentence is called as?
- a) Stemming.
- b) Lemmatization.
- c) Stop word.
- d) Tokenization.
Answer - c) Stop word
-
To identify location, people, and an organization from a given sentence is called?
- a) Stemming.
- b) Lemmatization.
- c) Named entity recognition.
- d) Topic modeling.
Answer - c) Named entity recognition
-
Which of the accompanying territories is where NLP can be valuable?
- a) Automatic text summarization.
- b) Automatic question answering systems.
- c) Information retrieval.
- d) All of the above.
Answer - d) All of the above
-
The process of deriving high quality information from text is referred to as ________.
- a) Image Mining.
- b) Database Mining.
- c) Multimedia Mining.
- d) Text Mining.
Answer - d) Text Mining
-
The various aspects of text mining is/are____________.
I. The text and documents are gathered into a corpus and organized.
II. The corpus is analyzed for structure. The result is a matrix mapping important terms to source documents.
III. The structured data then analyzes forward structures , sequences and frequency- a) (I), (II) only.
- b) (II),(III) only.
- c) (I), (II) and (III).
- d) None of the above.
Answer - c) (I), (II) and (III)
-
________is fundamentally defining unstructured data to structured data and applying text.
- a) Schema design.
- b) Matrix design.
- c) Table design.
- d) None of the above.
Answer - a) Schema design
-
In a structured and annotated text dataset you can just import into your program, to apply text mining operation is statistically referred as _______.
- a) Document.
- b) Corpus.
- c) Files.
- d) None of the above.
Answer - b) Corpus
-
Bag of words referred to as ________ .
- a) The representation of text that describes the occurrence of words within a document.
- b) Set of unstructured data .
- c) The representation of text that describes the meaning of every word within a document.
- d) None of the above.
Answer - a) The representation of text that describes the occurrence of words within a document
-
With text mining we are able to perform _________ tasks.
- a) Text categorization.
- b) Text clustering.
- c) Concept/entity extraction.
- d) All of the above.
Answer - d) All of the above
-
With text mining we are able to perform ________ tasks.
- a) Entity relation modeling (i.e., learning relations between named entities).
- b) Sentiment analysis.
- c) Document summarization.
- d) All of the above.
Answer - d) All of the above
-
Text mining is _________ method.
- a) Supervised learning.
- b) Unsupervised Learning.
- c) Automated learning.
- d) None of the above.
Answer - b) Unsupervised Learning
-
Select correct sequence of text mining process from below-
I. Establish the corpus of text: Gather documents, clean and prepare for analysis.
II. Structure with TDM matrix: Select bag of words, compute frequencies of occurrences.
III. Mine TDM for patterns: apply data mining tools such as classification, clustering etc.- a) (I) , (II) , (III).
- b) (II), (I), (III).
- c) (III), (I), (II).
- d) I), (III), (II).
Answer - a) (I) , (II) , (III)
-
In words approach (BOW) approach, we look at the __________ of the words within the text, i.e. considering each word count as a feature.
- a) Dendrogram.
- b) Scatterplot.
- c) Histogram.
- d) None of the above.
Answer - c) Histogram
-
The matrix (t X d) where t is the no. of terms and d is the no. of documents and which measures Frequencies of selected important words and/or phrases occurring in each document is called as ________ .
- a) True word matrix.
- b) Term by document matrix.
- c) Total term matrix.
- d) None of the above.
Answer - b) Term by document matrix
-
Machine learning algorithms cannot work with raw text directly; the text must be converted into numbers. Specifically, vectors of numbers. This is called _________.
- a) Feature creation.
- b) Feature coding.
- c) Feature extraction or feature encoding.
- d) None of the above.
Answer - c) Feature extraction or feature encoding
-
For a very large corpus, that the length of the vector might be thousands or millions of positions and each document may contain very few of the known words in the vocabulary then this results in a vector with lots of zero scores called as________.
- a) Null vector.
- b) Zero vector.
- c) Sparse Vector.
- d) None of the above.
Answer - c) Sparse Vector
-
The approach is to create a vocabulary of grouped word in order to the scope of the vocabulary and allows the bag-of-words to capture a little bit more meaning from the document then each word or token is called as __________ .
- a) Gram.
- b) Label.
- c) Vector.
- d) None of the above.
Answer - a) Gram
-
Creating a vocabulary of two-word pairs is, in turn, called a _________ model.
- a) Trigram.
- b) Unigram.
- c) Bigram.
- d) None of the above.
Answer - c) Bigram
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Navigate to Address
360DigiTMG - Data Science, Data Scientist Course Training in Bangalore
No 23, 2nd Floor, 9th Main Rd, 22nd Cross Rd, 7th Sector, HSR Layout, Bengaluru, Karnataka 560102