Practice 20 Natural Language Processing multiple-choice questions designed for CDAC CCAT exam preparation. Click "Show Answer" to reveal the correct option with detailed explanation.
Show Answer & Explanation
Correct Answer: B — Natural Language Processing
NLP (Natural Language Processing) is AI focused on interaction between computers and human language.
Show Answer & Explanation
Correct Answer: B — Breaking text into words or subwords
Tokenization splits text into individual units (tokens) like words, subwords, or characters.
Show Answer & Explanation
Correct Answer: B — Common words like "the", "is" often removed
Stop words are common words (the, is, at, etc.) often removed as they add little meaning.
Show Answer & Explanation
Correct Answer: B — Root/base form (may not be valid word)
Stemming removes suffixes to get word stems (running → runn), which may not be valid words.
Show Answer & Explanation
Correct Answer: B — Producing valid dictionary words
Lemmatization uses vocabulary and morphological analysis to return valid base forms (running → run).
Show Answer & Explanation
Correct Answer: B — Word importance in document relative to corpus
TF-IDF (Term Frequency-Inverse Document Frequency) measures how important a word is to a document in a corpus.
Show Answer & Explanation
Correct Answer: B — Represents text as word frequency counts ignoring order
Bag of Words represents text as a collection of word counts, ignoring grammar and word order.
Show Answer & Explanation
Correct Answer: B — Represent words as dense vectors capturing semantic meaning
Word embeddings map words to dense vectors where similar words have similar vector representations.
Show Answer & Explanation
Correct Answer: B — Named entities like persons, locations, organizations
NER identifies and classifies named entities in text into categories like person, organization, location.
Show Answer & Explanation
Correct Answer: B — Emotional tone (positive/negative/neutral)
Sentiment analysis identifies the emotional tone or attitude expressed in text.
Show Answer & Explanation
Correct Answer: B — Labels words with grammatical categories
POS tagging assigns grammatical categories (noun, verb, adjective, etc.) to words in text.
Show Answer & Explanation
Correct Answer: B — Transformer-based language model
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model for various NLP tasks.
Show Answer & Explanation
Correct Answer: B — Focuses on relevant parts of input
Attention allows models to focus on relevant parts of input when producing output, weighing importance dynamically.
Show Answer & Explanation
Correct Answer: B — Text from one language to another
Machine Translation automatically translates text from one natural language to another.
Show Answer & Explanation
Correct Answer: B — Categorizing documents into predefined classes
Text classification assigns documents to predefined categories like spam detection or topic labeling.
Show Answer & Explanation
Correct Answer: B — Contiguous sequences of n items from text
N-grams are contiguous sequences of n items (words or characters) from text, capturing local context.
Show Answer & Explanation
Correct Answer: B — Sequential data like text
RNNs process sequential data by maintaining hidden state that captures information from previous inputs.
Show Answer & Explanation
Correct Answer: B — Vanishing gradient and short-term memory
LSTM (Long Short-Term Memory) uses gates to preserve long-term dependencies and mitigate vanishing gradients.
Show Answer & Explanation
Correct Answer: B — Process sequences in parallel with attention
Transformers use self-attention to process all positions in parallel, enabling faster training and better long-range dependencies.
Show Answer & Explanation
Correct Answer: B — Self-supervised learning on large text corpora
GPT (Generative Pre-trained Transformer) is pre-trained using self-supervised learning to predict next tokens in text.