Home / Blog / Interview Questions on Data Science / Text Mining Interview Questions and Answers

Text Mining Interview Questions and Answers

  • September 04, 2022
  • 13577
  • 44
Author Images

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Read More >

Table of Content

  • Text Mining is performed on which kind of data?

    • a) Label data.
    • b) Unsaturated data.
    • c) Continues data.
    • d) Discrete data.

    Answer - b) Unsaturated data

  • Which of the following packages is where unstructured data cannot be useful?

    • a) Nltk.
    • b) Requests, re, magrittr.
    • c) From sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer.
    • d) From sklearn.naive_Bayes import MultinominalNB as MB.

    Answer - d) From sklearn.naive_Bayes import MultinominalNB as MB

  • Which of the following is a true statement for pre-processing topics in untrusted data?

    Statement 1: It is one of the most common stemming algorithms which are basically designed to remove and replace well-known suffixes of English words.
    Statement 2: Lemmatization technique is like Stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.
    • a) Statement 1&3 is true and statement 2 is false.
    • b) Statement 2&3 is False and statement 1 is true.
    • c) All the above statements are true.
    • d) None of the above.

    Answer - c) All the above statements are true

  • Which Step-by-step instruction is used to discover record closeness in NLP?

    • a) Lemmatization.
    • b) Euclidean distance.
    • c) Cosine Similarity.
    • d) N-gram.

    Answer - b) Euclidean distance,c)Cosine Similarity

  • To normalize keywords in NLP, which technique do we follow?

    • a) Lemmatization.
    • b) Parts of speech.
    • c) TF-IDF.
    • d) N-Gram.

    Answer - a) Lemmatization

  • Which one of the following is a perfect statement for Term Frequency (TF)?

    • a) % of words taking each document is called ___.
    • b) Talking about how popular feature across all the reviews.
    • c) 3To remove the effect of outliner concepts is called ____.
    • d) None of the Above.

    Answer - a) % of words taking each document is called ___.

  • What will TF-IDF do?

    • a) Most important word in the document.
    • b) To remove the effect of outliner concepts is called.
    • c) 3Measurements of how well probability distribution.
    • d) Most frequently occurring word in the document.

    Answer - a) Most important word in the document , b) To remove the effect of outliner concepts is called

  • What is the output of the line of code shown below?

    import nltk
    from nltk.stem import PorterStemmer
    word_stemmer = PorterStemmer()
    word_stemmer.stem(' easily', 'runner', 'running')
    • a) Easili,runner, run.
    • b) Easil, run, runn.
    • c) Easy, runne, running.
    • D) None of the above.

    Answer - a) Easili,runner, run

  • What are the common NLP techniques?

    • a) Named Entity Recognition.
    • b) Sentiment Analysis.
    • c) Text Modeling.
    • d) All the above.

    Answer - d) All the above

  • Which one of coming up next is certifiably not a pre-handling method in NLP?

    • a) Stemming and Lemmatization.
    • b) 2Converting to lowercase.
    • c) Removing punctuations.
    • d) Text summarization.

    Answer - d) Text summarization

  • Removing words like “and”, “is”, “a”, “an”, “the” from a sentence is called as?

    • a) Stemming.
    • b) Lemmatization.
    • c) Stop word.
    • d) Tokenization.

    Answer - c) Stop word

  • To identify location, people, and an organization from a given sentence is called?

    • a) Stemming.
    • b) Lemmatization.
    • c) Named entity recognition.
    • d) Topic modeling.

    Answer - c) Named entity recognition

  • Which of the accompanying territories is where NLP can be valuable?

    • a) Automatic text summarization.
    • b) Automatic question answering systems.
    • c) Information retrieval.
    • d) All of the above.

    Answer - d) All of the above

  • The process of deriving high quality information from text is referred to as ________.

    • a) Image Mining.
    • b) Database Mining.
    • c) Multimedia Mining.
    • d) Text Mining.

    Answer - d) Text Mining

  • The various aspects of text mining is/are____________.

    I. The text and documents are gathered into a corpus and organized.
    II. The corpus is analyzed for structure. The result is a matrix mapping important terms to source documents.
    III. The structured data then analyzes forward structures , sequences and frequency
    • a) (I), (II) only.
    • b) (II),(III) only.
    • c) (I), (II) and (III).
    • d) None of the above.

    Answer - c) (I), (II) and (III)

  • ________is fundamentally defining unstructured data to structured data and applying text.

    • a) Schema design.
    • b) Matrix design.
    • c) Table design.
    • d) None of the above.

    Answer - a) Schema design

  • In a structured and annotated text dataset you can just import into your program, to apply text mining operation is statistically referred as _______.

    • a) Document.
    • b) Corpus.
    • c) Files.
    • d) None of the above.

    Answer - b) Corpus

  • Bag of words referred to as ________ .

    • a) The representation of text that describes the occurrence of words within a document.
    • b) Set of unstructured data .
    • c) The representation of text that describes the meaning of every word within a document.
    • d) None of the above.

    Answer - a) The representation of text that describes the occurrence of words within a document

  • With text mining we are able to perform _________ tasks.

    • a) Text categorization.
    • b) Text clustering.
    • c) Concept/entity extraction.
    • d) All of the above.

    Answer - d) All of the above

  • With text mining we are able to perform ________ tasks.

    • a) Entity relation modeling (i.e., learning relations between named entities).
    • b) Sentiment analysis.
    • c) Document summarization.
    • d) All of the above.

    Answer - d) All of the above

  • Text mining is _________ method.

    • a) Supervised learning.
    • b) Unsupervised Learning.
    • c) Automated learning.
    • d) None of the above.

    Answer - b) Unsupervised Learning

  • Select correct sequence of text mining process from below-

    I. Establish the corpus of text: Gather documents, clean and prepare for analysis.
    II. Structure with TDM matrix: Select bag of words, compute frequencies of occurrences.
    III. Mine TDM for patterns: apply data mining tools such as classification, clustering etc.
    • a) (I) , (II) , (III).
    • b) (II), (I), (III).
    • c) (III), (I), (II).
    • d) I), (III), (II).

    Answer - a) (I) , (II) , (III)

  • In words approach (BOW) approach, we look at the __________ of the words within the text, i.e. considering each word count as a feature.

    • a) Dendrogram.
    • b) Scatterplot.
    • c) Histogram.
    • d) None of the above.

    Answer - c) Histogram

  • The matrix (t X d) where t is the no. of terms and d is the no. of documents and which measures Frequencies of selected important words and/or phrases occurring in each document is called as ________ .

    • a) True word matrix.
    • b) Term by document matrix.
    • c) Total term matrix.
    • d) None of the above.

    Answer - b) Term by document matrix

  • Machine learning algorithms cannot work with raw text directly; the text must be converted into numbers. Specifically, vectors of numbers. This is called _________.

    • a) Feature creation.
    • b) Feature coding.
    • c) Feature extraction or feature encoding.
    • d) None of the above.

    Answer - c) Feature extraction or feature encoding

  • For a very large corpus, that the length of the vector might be thousands or millions of positions and each document may contain very few of the known words in the vocabulary then this results in a vector with lots of zero scores called as________.

    • a) Null vector.
    • b) Zero vector.
    • c) Sparse Vector.
    • d) None of the above.

    Answer - c) Sparse Vector

  • The approach is to create a vocabulary of grouped word in order to the scope of the vocabulary and allows the bag-of-words to capture a little bit more meaning from the document then each word or token is called as __________ .

    • a) Gram.
    • b) Label.
    • c) Vector.
    • d) None of the above.

    Answer - a) Gram

  • Creating a vocabulary of two-word pairs is, in turn, called a _________ model.

    • a) Trigram.
    • b) Unigram.
    • c) Bigram.
    • d) None of the above.

    Answer - c) Bigram

Data Science Training Institutes in Other Locations

Navigate to Address

360DigiTMG - Data Science, Data Scientist Course Training in Bangalore

No 23, 2nd Floor, 9th Main Rd, 22nd Cross Rd, 7th Sector, HSR Layout, Bengaluru, Karnataka 560102

Get Direction: Data Science Courses

Read
Success Stories
Make an Enquiry