Home / Blog / Interview Questions / Topic Modeling Interview Questions & Answers

Topic Modeling Interview Questions & Answers

  • September 05, 2022
  • 9116
  • 49
Author Images

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Read More >

Table of Content

  • What is perplexity in topic modeling?

    • a) Predict the quality of topics in a better way
    • b) Finding the best word distribution for each topic
    • c) Most of the words and result in a more specific word distribution per topic
    • d) Measurements of how well probability distribution

    Answer - d) Measurements of how well probability distribution

  • Which of the following is the true statement for Topic Modeling (LDA)?
    Statement 1: It is used to spot the semantic relationship b/w words in a group with the help of associated indicators.
    Statement 2: To understand the meaning from the given text (or) document it is important to identify who did what to whom.

    • a) Statement 1 is true and statement 2 is false
    • b) Statement 1 is False and statement 2 is true
    • c) Both Statement (1 & 2) is wrong
    • d) Both Statement (1 & 2) is true

    Answer - a) Statement 1 is true and statement 2 is false

  • In Topic modeling which hyperparameters tuning used for represents document-topic Density?

    • a) Dirichlet hyperparameter Beta
    • b) Dirichlet hyperparameter alpha
    • c) Number of Topics (K)
    • d) None of them

    Answer - b) Dirichlet hyperparameter alpha

  • Which one of the following is a wrong statement for Evaluation of Topic Modeling?

    • a) Predict the quality of topics in a better way
    • b) Qualifies the semantic similarity of the high-scoring words within each topic
    • c) Most of the words and result in a more specific word distribution per topic
    • d) Measurements of how well probability distribution

    Answer - d) Measurements of how well probability distribution

  • The process of obtaining the root word from the given word is known as _______?

    • a) Stemming
    • b) Lemmatization
    • c) Stop words
    • d) Tokenization

    Answer - a) Stemming

  • While performing Topic Modeling (LDA) which python _______ package we use?

    • a) Sklearn
    • b) LDAviz
    • c) Nltk
    • d) Gensim

    Answer - d) Gensim

  • To identify location, people, and an organization from a given sentence is called?

    • a) Stemming
    • b) Lemmatization
    • c) Named entity recognition
    • d) Topic modeling

    Answer - c) Named entity recognition

  • In Topic modeling which hyper parameters tuning used for represents Word-Topic Density?

    • a) Alpha parameter
    • b) Number of Topics (K)
    • c) Beta parameter
    • d) None of them

    Answer - c) Beta parameter

  • To remove the effect of outliner concepts is called ________?

    • a) DTM
    • b) Stemming
    • c) TF-IDF
    • d) N-gram

    Answer - c) TF-IDF

  • To normalize keywords in NLP, which technique do we follow?

    • a) Lemmatization
    • b) Parts of speech
    • c) TF-IDF
    • d) N-Gram

    Answer - a) Lemmatization

  • Which one is the following area where NLP can be useful?

    • a) Automatic Text Summarization
    • b) Automatic Question-Answering Systems
    • c) Information Retrieval
    • d) All of the mentioned

    Answer - d) All of the mentioned

  • Which one of coming up next is anything but a pre-handling strategy in NLP

    • a) Stemming and Lemmatization
    • b) Tokenization
    • c) Stop words removal
    • d) Sentiment analysis

    Answer - d) Sentiment analysis

  • Which of the following is a true statement for advanced pre-processing topics in NLP?
    Statement 1: TF-IDF helps remove the outliers.
    Statement 2: N-gram in NLP is simply a sequence of n words, and we also conclude the sentences which appeared more frequently, the items can be phonemes, syllables, letters, words, or base pairs according to the application.
    Statement 3: Bag-of-words is an approach used in NLP to represent a text as the multi-set of words (unigrams) that appear in it.

    • a) Statement 1&3 is true and statement 2 is false
    • b) Statement 2&3 is False and statement 1 is true
    • c) All the above statements are true
    • d) None of the above

    Answer - c) All the above statements are true

  • Which NLP model gives the best accuracy?

    • a) Naive Bayes
    • b) Cosine similarity
    • c) Random forest
    • d) KNN

    Answer - a) Naive Bayes

  • Topic modeling is a ___________.

    • a) Technique of only labeling a text.
    • b) Technique of changing data labels.
    • c) Technique to understand and extract the hidden topics from large volumes of text.
    • d) None of the above.

    Answer - c) Technique to understand and extract the hidden topics from large volumes of text

  • Topic model techniques is/are _________ .

    • a) Latent semantic indexing (LSI).
    • b) Probabilistic latent semantic analysis (PLSA).
    • c) Latent Dirichlet allocation (LDA).
    • d) All of the above.

    Answer - d) All of the above

  • Classically, topic models are introduced in the text analysis community for________________ topic discovery in a corpus of documents.

    • a) Unsupervised.
    • b) Supervised.
    • c) Semi-automated.
    • d) None of the above.

    Answer - a) Unsupervised

  • Nevertheless, “topics” discovered in an unsupervised way may not match the true topics in the data. The typical supervised topic models include .

    • a) Supervised LDA (sLDA).
    • b) Discriminative variation on LDA (discLDA).
    • c) Maximum entropy discrimination LDA (medLDA).
    • d) All of the above.

    Answer - d) All of the above

  • Once a vocabulary has been chosen, the occurrence of words in example documents needs to be scored. The Scoring method of counting the number of times each word appears in a document is called _______________ .

    • a) Counts.
    • b) Frequencies.
    • c) Repeatability.
    • d) None of the above.

    Answer - a) Counts

  • Once a vocabulary has been chosen, the occurrence of words in example documents needs to be scored. The Scoring method of Calculating the frequency that each word appears in a document out of all the words in the document. is called _______________ .

    • a) Counts.
    • b) Frequencies.
    • c) Repeatability.
    • d) None of the above.

    Answer - b) Frequencies

  • The basic assumption of topic modeling is _______________________.

    • a) Exchange of topics.
    • b) Repeatability of topics.
    • c) Exchangeability of word and documents.
    • d) None of the above.

    Answer - c) Exchangeability of word and documents

  • ________is a scoring of the frequency of the word in the current document.

    • a) Document frequency.
    • b) Term frequency.
    • c) File frequency.
    • d) None of the above.

    Answer - b) Term frequency

  • ___________ is a scoring of how rare the word is across documents.

    • a) Inverse Document frequency.
    • b) Term frequency.
    • c) File frequency.
    • d) None of the above.

    Answer - a) Inverse Document frequency

  • Package used for topic modeling in Python is/are __________ .

    • a) Gensim.
    • b) NLTK.
    • c) Spacy.
    • d) All of the above.

    Answer - d) All of the above

  • Latent Dirichlet Allocation (LDA) and Latent Semantic Allocation (LSA) are based on ____________________________ assumptions.

    • a) Distributional hypothesis.
    • b) Statistical mixture hypothesis.
    • c) Both of the above.
    • d) Not any from (a) and (b).

    Answer - c) Both of the above

  • One of the basic assumptions of LDA and LSA as a distributional hypothesis which means __________________.

    • a) Similar topics make use of similar words.
    • b) Different topics make use of similar words.
    • c) Similar topics make use of different words.
    • d) None of the above.

    Answer - a) Similar topics make use of similar words

  • One of the basic assumptions of LDA and LSA as a statistical mixture hypothesis which means _________.

    • a) Documents talk about several topics.
    • b) Similar topics make use of similar words.
    • c) Documents talk about prefixed topics.
    • d) None of the above.

    Answer - a) Documents talk about several topics

  • Choose the correct statement from below –

    I. The purpose of LDA is mapping each document in our corpus to a set of topics which covers a good deal of the words in the document.
    II. LSA, LDA also ignores syntactic information and treats documents as bags of words.
    III. There are two hyper parameters that control document and topic similarity, known as alpha and beta respectively
    • a) (I).
    • b) b(II).
    • c) III).
    • d) All of the above.

    Answer - d) All of the above

  • Choose the correct statement from below –

    I. A low value of alpha will assign fewer topics to each document whereas a high value of alpha will have the opposite effect.
    II. A low value of beta will use fewer words to model a topic whereas a high value will use more words, thus making topics more similar between them.
    III. LDA cannot decide on the number of topics by itself.
    • a) (I).
    • b) b(II).
    • c) III).
    • d) All of the above.

    Answer - d) All of the above

  • The words in third person are changed to first person and verbs in past and future tenses are changed into present is then we say words are_________ .

    • a) Stemmed.
    • b) Lemmatized.
    • c) Regularized.
    • d) None of the above.

    Answer - b) Lemmatized

  • The word reduced to its root form is called as _________ .

    • a) Stemming.
    • b) Lemmatizing.
    • c) Regularizing.
    • d) None of the above.

    Answer - a) Stemming

Read
Success Stories
Make an Enquiry