Natural Language Processing MCQs for CCAT | 20 Practice Questions with Answers

Q: NLP stands for:

The correct answer is B: Natural Language Processing. NLP (Natural Language Processing) is AI focused on interaction between computers and human language.

Q: Tokenization in NLP is:

The correct answer is B: Breaking text into words or subwords. Tokenization splits text into individual units (tokens) like words, subwords, or characters.

Q: Stop words are:

The correct answer is B: Common words like "the", "is" often removed. Stop words are common words (the, is, at, etc.) often removed as they add little meaning.

Q: Stemming reduces words to:

The correct answer is B: Root/base form (may not be valid word). Stemming removes suffixes to get word stems (running → runn), which may not be valid words.

Q: Lemmatization differs from stemming by:

The correct answer is B: Producing valid dictionary words. Lemmatization uses vocabulary and morphological analysis to return valid base forms (running → run).

Q1.

NLP stands for:

ANetwork Layer Protocol

BNatural Language Processing

CNumeric Linear Programming

DNeural Logic Processing

Show Answer & Explanation

Correct Answer: B — Natural Language Processing

NLP (Natural Language Processing) is AI focused on interaction between computers and human language.

Q2.

Tokenization in NLP is:

AEncrypting text

BBreaking text into words or subwords

CTranslating text

DCompressing text

Show Answer & Explanation

Correct Answer: B — Breaking text into words or subwords

Tokenization splits text into individual units (tokens) like words, subwords, or characters.

Q3.

Stop words are:

AImportant keywords

BCommon words like "the", "is" often removed

CError words

DEncrypted words

Show Answer & Explanation

Correct Answer: B — Common words like "the", "is" often removed

Stop words are common words (the, is, at, etc.) often removed as they add little meaning.

Q4.

Stemming reduces words to:

AFull sentences

BRoot/base form (may not be valid word)

CLonger versions

DSynonyms

Show Answer & Explanation

Correct Answer: B — Root/base form (may not be valid word)

Stemming removes suffixes to get word stems (running → runn), which may not be valid words.

Q5.

Lemmatization differs from stemming by:

ABeing faster

BProducing valid dictionary words

CRemoving all characters

DAdding prefixes

Show Answer & Explanation

Correct Answer: B — Producing valid dictionary words

Lemmatization uses vocabulary and morphological analysis to return valid base forms (running → run).

Q6.

TF-IDF measures:

AWord length

BWord importance in document relative to corpus

CSentence count

DParagraph length

Show Answer & Explanation

Correct Answer: B — Word importance in document relative to corpus

TF-IDF (Term Frequency-Inverse Document Frequency) measures how important a word is to a document in a corpus.

Q7.

Bag of Words model:

APreserves word order

BRepresents text as word frequency counts ignoring order

COnly uses nouns

DOnly uses verbs

Show Answer & Explanation

Correct Answer: B — Represents text as word frequency counts ignoring order

Bag of Words represents text as a collection of word counts, ignoring grammar and word order.

Q8.

Word embeddings like Word2Vec:

AAre one-hot encoded

BRepresent words as dense vectors capturing semantic meaning

CUse only binary values

DIgnore context

Show Answer & Explanation

Correct Answer: B — Represent words as dense vectors capturing semantic meaning

Word embeddings map words to dense vectors where similar words have similar vector representations.

Q9.

Named Entity Recognition (NER) identifies:

AGrammar errors

BNamed entities like persons, locations, organizations

CPunctuation

DLine breaks

Show Answer & Explanation

Correct Answer: B — Named entities like persons, locations, organizations

NER identifies and classifies named entities in text into categories like person, organization, location.

Q10.

Sentiment analysis determines:

AWord count

BEmotional tone (positive/negative/neutral)

CFile size

DLanguage type

Show Answer & Explanation

Correct Answer: B — Emotional tone (positive/negative/neutral)

Sentiment analysis identifies the emotional tone or attitude expressed in text.

Q11.

Part-of-Speech (POS) tagging:

ARemoves words

BLabels words with grammatical categories

CTranslates text

DCompresses text

Show Answer & Explanation

Correct Answer: B — Labels words with grammatical categories

POS tagging assigns grammatical categories (noun, verb, adjective, etc.) to words in text.

Q12.

BERT is a:

ADatabase

BTransformer-based language model

CProgramming language

DFile format

Show Answer & Explanation

Correct Answer: B — Transformer-based language model

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model for various NLP tasks.

Q13.

Attention mechanism in NLP:

AIgnores all context

BFocuses on relevant parts of input

COnly uses first word

DOnly uses last word

Show Answer & Explanation

Correct Answer: B — Focuses on relevant parts of input

Attention allows models to focus on relevant parts of input when producing output, weighing importance dynamically.

Q14.

Machine Translation converts:

ASpeech to text

BText from one language to another

CImages to text

DNumbers to text

Show Answer & Explanation

Correct Answer: B — Text from one language to another

Machine Translation automatically translates text from one natural language to another.

Q15.

Text classification is used for:

ASorting files by size

BCategorizing documents into predefined classes

CCounting words

DEncrypting text

Show Answer & Explanation

Correct Answer: B — Categorizing documents into predefined classes

Text classification assigns documents to predefined categories like spam detection or topic labeling.

Q16.

N-grams are:

AWeight units

BContiguous sequences of n items from text

CError codes

DFile types

Show Answer & Explanation

Correct Answer: B — Contiguous sequences of n items from text

N-grams are contiguous sequences of n items (words or characters) from text, capturing local context.

Q17.

Recurrent Neural Networks (RNN) are suitable for:

AOnly images

BSequential data like text

COnly numbers

DOnly audio

Show Answer & Explanation

Correct Answer: B — Sequential data like text

RNNs process sequential data by maintaining hidden state that captures information from previous inputs.

Q18.

LSTM addresses RNN limitation of:

ABeing too fast

BVanishing gradient and short-term memory

CUsing too little memory

DBeing too accurate

Show Answer & Explanation

Correct Answer: B — Vanishing gradient and short-term memory

LSTM (Long Short-Term Memory) uses gates to preserve long-term dependencies and mitigate vanishing gradients.

Q19.

Transformers replaced RNNs because they:

AAre slower

BProcess sequences in parallel with attention

CUse less memory

DAre simpler

Show Answer & Explanation

Correct Answer: B — Process sequences in parallel with attention

Transformers use self-attention to process all positions in parallel, enabling faster training and better long-range dependencies.

Q20.

GPT models are trained using:

ASupervised learning only

BSelf-supervised learning on large text corpora

CReinforcement learning only

DUnsupervised clustering

Show Answer & Explanation

Correct Answer: B — Self-supervised learning on large text corpora

GPT (Generative Pre-trained Transformer) is pre-trained using self-supervised learning to predict next tokens in text.

Natural Language Processing — Practice MCQs for CCAT

More AI & Machine Learning Topics

Ready for the real exam?