AI & Machine Learning

Natural Language Processing — Practice MCQs for CCAT

20 Questions Section B: Programming AI & Machine Learning

Practice 20 Natural Language Processing multiple-choice questions designed for CDAC CCAT exam preparation. Click "Show Answer" to reveal the correct option with detailed explanation.

Q1.
NLP stands for:
ANetwork Layer Protocol
BNatural Language Processing
CNumeric Linear Programming
DNeural Logic Processing
Show Answer & Explanation

Correct Answer: B — Natural Language Processing

NLP (Natural Language Processing) is AI focused on interaction between computers and human language.

Q2.
Tokenization in NLP is:
AEncrypting text
BBreaking text into words or subwords
CTranslating text
DCompressing text
Show Answer & Explanation

Correct Answer: B — Breaking text into words or subwords

Tokenization splits text into individual units (tokens) like words, subwords, or characters.

Q3.
Stop words are:
AImportant keywords
BCommon words like "the", "is" often removed
CError words
DEncrypted words
Show Answer & Explanation

Correct Answer: B — Common words like "the", "is" often removed

Stop words are common words (the, is, at, etc.) often removed as they add little meaning.

Q4.
Stemming reduces words to:
AFull sentences
BRoot/base form (may not be valid word)
CLonger versions
DSynonyms
Show Answer & Explanation

Correct Answer: B — Root/base form (may not be valid word)

Stemming removes suffixes to get word stems (running → runn), which may not be valid words.

Q5.
Lemmatization differs from stemming by:
ABeing faster
BProducing valid dictionary words
CRemoving all characters
DAdding prefixes
Show Answer & Explanation

Correct Answer: B — Producing valid dictionary words

Lemmatization uses vocabulary and morphological analysis to return valid base forms (running → run).

Q6.
TF-IDF measures:
AWord length
BWord importance in document relative to corpus
CSentence count
DParagraph length
Show Answer & Explanation

Correct Answer: B — Word importance in document relative to corpus

TF-IDF (Term Frequency-Inverse Document Frequency) measures how important a word is to a document in a corpus.

Q7.
Bag of Words model:
APreserves word order
BRepresents text as word frequency counts ignoring order
COnly uses nouns
DOnly uses verbs
Show Answer & Explanation

Correct Answer: B — Represents text as word frequency counts ignoring order

Bag of Words represents text as a collection of word counts, ignoring grammar and word order.

Q8.
Word embeddings like Word2Vec:
AAre one-hot encoded
BRepresent words as dense vectors capturing semantic meaning
CUse only binary values
DIgnore context
Show Answer & Explanation

Correct Answer: B — Represent words as dense vectors capturing semantic meaning

Word embeddings map words to dense vectors where similar words have similar vector representations.

Q9.
Named Entity Recognition (NER) identifies:
AGrammar errors
BNamed entities like persons, locations, organizations
CPunctuation
DLine breaks
Show Answer & Explanation

Correct Answer: B — Named entities like persons, locations, organizations

NER identifies and classifies named entities in text into categories like person, organization, location.

Q10.
Sentiment analysis determines:
AWord count
BEmotional tone (positive/negative/neutral)
CFile size
DLanguage type
Show Answer & Explanation

Correct Answer: B — Emotional tone (positive/negative/neutral)

Sentiment analysis identifies the emotional tone or attitude expressed in text.

Q11.
Part-of-Speech (POS) tagging:
ARemoves words
BLabels words with grammatical categories
CTranslates text
DCompresses text
Show Answer & Explanation

Correct Answer: B — Labels words with grammatical categories

POS tagging assigns grammatical categories (noun, verb, adjective, etc.) to words in text.

Q12.
BERT is a:
ADatabase
BTransformer-based language model
CProgramming language
DFile format
Show Answer & Explanation

Correct Answer: B — Transformer-based language model

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model for various NLP tasks.

Q13.
Attention mechanism in NLP:
AIgnores all context
BFocuses on relevant parts of input
COnly uses first word
DOnly uses last word
Show Answer & Explanation

Correct Answer: B — Focuses on relevant parts of input

Attention allows models to focus on relevant parts of input when producing output, weighing importance dynamically.

Q14.
Machine Translation converts:
ASpeech to text
BText from one language to another
CImages to text
DNumbers to text
Show Answer & Explanation

Correct Answer: B — Text from one language to another

Machine Translation automatically translates text from one natural language to another.

Q15.
Text classification is used for:
ASorting files by size
BCategorizing documents into predefined classes
CCounting words
DEncrypting text
Show Answer & Explanation

Correct Answer: B — Categorizing documents into predefined classes

Text classification assigns documents to predefined categories like spam detection or topic labeling.

Q16.
N-grams are:
AWeight units
BContiguous sequences of n items from text
CError codes
DFile types
Show Answer & Explanation

Correct Answer: B — Contiguous sequences of n items from text

N-grams are contiguous sequences of n items (words or characters) from text, capturing local context.

Q17.
Recurrent Neural Networks (RNN) are suitable for:
AOnly images
BSequential data like text
COnly numbers
DOnly audio
Show Answer & Explanation

Correct Answer: B — Sequential data like text

RNNs process sequential data by maintaining hidden state that captures information from previous inputs.

Q18.
LSTM addresses RNN limitation of:
ABeing too fast
BVanishing gradient and short-term memory
CUsing too little memory
DBeing too accurate
Show Answer & Explanation

Correct Answer: B — Vanishing gradient and short-term memory

LSTM (Long Short-Term Memory) uses gates to preserve long-term dependencies and mitigate vanishing gradients.

Q19.
Transformers replaced RNNs because they:
AAre slower
BProcess sequences in parallel with attention
CUse less memory
DAre simpler
Show Answer & Explanation

Correct Answer: B — Process sequences in parallel with attention

Transformers use self-attention to process all positions in parallel, enabling faster training and better long-range dependencies.

Q20.
GPT models are trained using:
ASupervised learning only
BSelf-supervised learning on large text corpora
CReinforcement learning only
DUnsupervised clustering
Show Answer & Explanation

Correct Answer: B — Self-supervised learning on large text corpora

GPT (Generative Pre-trained Transformer) is pre-trained using self-supervised learning to predict next tokens in text.