Back to Practice Artificial Intelligence

Natural Language Processing - Practice MCQs for CCAT

50 Questions Section B: Programming Artificial Intelligence

Natural Language Processing Question Bank for C-CAT

Topic-wise Natural Language Processing MCQs for CDAC C-CAT preparation with answers and explanations.

Q1.
NLP stands for:
ANetwork Layer Protocol
BNatural Language Processing
CNumeric Linear Programming
DNeural Logic Processing
Show Answer & Explanation

Correct Answer: B - Natural Language Processing

NLP (Natural Language Processing) is AI focused on interaction between computers and human language.

Q2.
Tokenization in NLP is:
AEncrypting text
BTranslating text
CBreaking text into words or subwords
DCompressing text
Show Answer & Explanation

Correct Answer: C - Breaking text into words or subwords

Tokenization splits text into individual units (tokens) like words, subwords, or characters.

Q3.
Stop words are:
AImportant keywords
BCommon words like "the", "is" often removed
CError words
DEncrypted words
Show Answer & Explanation

Correct Answer: B - Common words like "the", "is" often removed

Stop words are common words (the, is, at, etc.) often removed as they add little meaning.

Q4.
Stemming reduces words to:
AFull sentences
BLonger versions
CRoot/base form (may not be valid word)
DSynonyms
Show Answer & Explanation

Correct Answer: C - Root/base form (may not be valid word)

Stemming removes suffixes to get word stems (running → runn), which may not be valid words.

Q5.
Lemmatization differs from stemming by:
ABeing faster
BProducing valid dictionary words
CRemoving all characters
DAdding prefixes
Show Answer & Explanation

Correct Answer: B - Producing valid dictionary words

Lemmatization uses vocabulary and morphological analysis to return valid base forms (running → run).

Q6.
TF-IDF measures:
AWord length
BParagraph length
CSentence count
DWord importance in document relative to corpus
Show Answer & Explanation

Correct Answer: D - Word importance in document relative to corpus

TF-IDF (Term Frequency-Inverse Document Frequency) measures how important a word is to a document in a corpus.

Q7.
Bag of Words model:
APreserves word order
BOnly uses nouns
CRepresents text as word frequency counts ignoring order
DOnly uses verbs
Show Answer & Explanation

Correct Answer: C - Represents text as word frequency counts ignoring order

Bag of Words represents text as a collection of word counts, ignoring grammar and word order.

Q8.
Word embeddings like Word2Vec:
ARepresent words as dense vectors capturing semantic meaning
BAre one-hot encoded
CUse only binary values
DIgnore context
Show Answer & Explanation

Correct Answer: A - Represent words as dense vectors capturing semantic meaning

Word embeddings map words to dense vectors where similar words have similar vector representations.

Q9.
Named Entity Recognition (NER) identifies:
AGrammar errors
BNamed entities like persons, locations, organizations
CPunctuation
DLine breaks
Show Answer & Explanation

Correct Answer: B - Named entities like persons, locations, organizations

NER identifies and classifies named entities in text into categories like person, organization, location.

Q10.
Sentiment analysis determines:
AWord count
BLanguage type
CFile size
DEmotional tone (positive/negative/neutral)
Show Answer & Explanation

Correct Answer: D - Emotional tone (positive/negative/neutral)

Sentiment analysis identifies the emotional tone or attitude expressed in text.

Q11.
Part-of-Speech (POS) tagging:
ARemoves words
BLabels words with grammatical categories
CTranslates text
DCompresses text
Show Answer & Explanation

Correct Answer: B - Labels words with grammatical categories

POS tagging assigns grammatical categories (noun, verb, adjective, etc.) to words in text.

Q12.
BERT is a:
ADatabase
BTransformer-based language model
CProgramming language
DFile format
Show Answer & Explanation

Correct Answer: B - Transformer-based language model

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model for various NLP tasks.

Q13.
Attention mechanism in NLP:
AIgnores all context
BFocuses on relevant parts of input
COnly uses first word
DOnly uses last word
Show Answer & Explanation

Correct Answer: B - Focuses on relevant parts of input

Attention allows models to focus on relevant parts of input when producing output, weighing importance dynamically.

Q14.
Machine Translation converts:
ASpeech to text
BImages to text
CText from one language to another
DNumbers to text
Show Answer & Explanation

Correct Answer: C - Text from one language to another

Machine Translation automatically translates text from one natural language to another.

Q15.
Text classification is used for:
ASorting files by size
BEncrypting text
CCounting words
DCategorizing documents into predefined classes
Show Answer & Explanation

Correct Answer: D - Categorizing documents into predefined classes

Text classification assigns documents to predefined categories like spam detection or topic labeling.

Q16.
N-grams are:
AWeight units
BContiguous sequences of n items from text
CError codes
DFile types
Show Answer & Explanation

Correct Answer: B - Contiguous sequences of n items from text

N-grams are contiguous sequences of n items (words or characters) from text, capturing local context.

Q17.
Recurrent Neural Networks (RNN) are suitable for:
AOnly images
BOnly numbers
CSequential data like text
DOnly audio
Show Answer & Explanation

Correct Answer: C - Sequential data like text

RNNs process sequential data by maintaining hidden state that captures information from previous inputs.

Q18.
LSTM addresses RNN limitation of:
AVanishing gradient and short-term memory
BBeing too fast
CUsing too little memory
DBeing too accurate
Show Answer & Explanation

Correct Answer: A - Vanishing gradient and short-term memory

LSTM (Long Short-Term Memory) uses gates to preserve long-term dependencies and mitigate vanishing gradients.

Q19.
Transformers replaced RNNs because they:
AAre slower
BProcess sequences in parallel with attention
CUse less memory
DAre simpler
Show Answer & Explanation

Correct Answer: B - Process sequences in parallel with attention

Transformers use self-attention to process all positions in parallel, enabling faster training and better long-range dependencies.

Q20.
GPT models are trained using:
ASelf-supervised learning on large text corpora
BSupervised learning only
CReinforcement learning only
DUnsupervised clustering
Show Answer & Explanation

Correct Answer: A - Self-supervised learning on large text corpora

GPT (Generative Pre-trained Transformer) is pre-trained using self-supervised learning to predict next tokens in text.

Q21.
Tokenization in NLP refers to:
AEncrypting text data
BTranslating text to another language
CBreaking text into smaller units like words or sentences
DRemoving stop words from text
Show Answer & Explanation

Correct Answer: C - Breaking text into smaller units like words or sentences

Tokenization is the process of breaking text into smaller units called tokens, which can be words, subwords, or sentences. It is typically the first step in any NLP pipeline.

Q22.
Stemming reduces words to their:
ASynonyms
BFull dictionary form
CAntonyms
DRoot or base form by removing affixes
Show Answer & Explanation

Correct Answer: D - Root or base form by removing affixes

Stemming reduces words to their root form by removing suffixes and prefixes. For example, 'running', 'runs', 'ran' are all reduced to 'run'. It may produce non-dictionary words.

Q23.
Named Entity Recognition (NER) is used to:
ATranslate text
BIdentify and classify named entities like persons, organizations, and locations
CGenerate text summaries
DCheck grammar
Show Answer & Explanation

Correct Answer: B - Identify and classify named entities like persons, organizations, and locations

NER identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, dates, and monetary values.

Q24.
Sentiment Analysis determines:
AThe emotional tone or opinion expressed in text
BThe language of the text
CThe grammatical correctness
DThe reading difficulty level
Show Answer & Explanation

Correct Answer: A - The emotional tone or opinion expressed in text

Sentiment Analysis determines the emotional tone behind text — whether it expresses positive, negative, or neutral sentiment. It is widely used in analyzing customer reviews and social media.

Q25.
What is the difference between stemming and lemmatization?
ALemmatization produces valid dictionary words; stemming may not
BStemming uses dictionary lookup; lemmatization uses rule-based stripping
CThey are exactly the same
DStemming is more accurate than lemmatization
Show Answer & Explanation

Correct Answer: A - Lemmatization produces valid dictionary words; stemming may not

Lemmatization uses vocabulary and morphological analysis to return valid dictionary words (lemmas), while stemming uses simple rule-based suffix stripping which may produce non-dictionary words.

Q26.
TF-IDF stands for:
AText Frequency-Inverse Data Frequency
BToken Filter-Indexed Document Feature
CTerm Frequency-Inverse Document Frequency
DText Format-Inverse Data Format
Show Answer & Explanation

Correct Answer: C - Term Frequency-Inverse Document Frequency

TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure that evaluates how important a word is to a document in a collection. It increases with frequency in a document but decreases with frequency across documents.

Q27.
Which of the following is a stop word?
AAlgorithm
BThe
CNeural
DDatabase
Show Answer & Explanation

Correct Answer: B - The

Stop words are common words like 'the', 'is', 'at', 'which', 'on' that are filtered out during text preprocessing because they carry little meaningful information for analysis.

Q28.
Bag of Words (BoW) model represents text as:
AA multiset of words disregarding grammar and word order
BA sequence of word embeddings
CA parse tree
DA knowledge graph
Show Answer & Explanation

Correct Answer: A - A multiset of words disregarding grammar and word order

Bag of Words represents text as a multiset (bag) of words, disregarding grammar and word order. Each document is represented as a vector of word counts or frequencies.

Q29.
Word2Vec is a technique used for:
AText classification
BMachine translation
CNamed entity recognition
DCreating word embeddings
Show Answer & Explanation

Correct Answer: D - Creating word embeddings

Word2Vec creates dense vector representations (embeddings) of words that capture semantic relationships. Words with similar meanings have similar vectors. It uses CBOW or Skip-gram architectures.

Q30.
Part-of-Speech (POS) tagging assigns:
ASentiment scores to words
BNamed entity labels to words
CTopic labels to documents
DGrammatical categories like noun, verb, adjective to each word
Show Answer & Explanation

Correct Answer: D - Grammatical categories like noun, verb, adjective to each word

POS tagging assigns grammatical categories (noun, verb, adjective, adverb, etc.) to each word in a sentence. It is fundamental to many NLP tasks like parsing and information extraction.

Q31.
Which NLP task involves converting speech to written text?
ASpeech Recognition
BText-to-Speech
CMachine Translation
DText Summarization
Show Answer & Explanation

Correct Answer: A - Speech Recognition

Speech Recognition (also called Automatic Speech Recognition or ASR) converts spoken language into written text. It is used in virtual assistants, dictation software, and voice-controlled systems.

Q32.
N-grams in NLP refer to:
AA type of neural network
BGrammar rules
CNamed entities in text
DContiguous sequences of n items from text
Show Answer & Explanation

Correct Answer: D - Contiguous sequences of n items from text

N-grams are contiguous sequences of n items (words or characters) from text. Unigrams (n=1), bigrams (n=2), and trigrams (n=3) are commonly used for language modeling and text analysis.

Q33.
Which technique converts words into fixed-length dense vectors?
AOne-hot encoding
BTF-IDF
CBag of Words
DWord Embeddings
Show Answer & Explanation

Correct Answer: D - Word Embeddings

Word Embeddings (like Word2Vec, GloVe, FastText) convert words into fixed-length dense vectors that capture semantic meaning. Unlike one-hot encoding, similar words have similar vector representations.

Q34.
Text classification is an example of:
ASupervised learning
BUnsupervised learning
CReinforcement learning
DSemi-supervised learning
Show Answer & Explanation

Correct Answer: A - Supervised learning

Text classification is a supervised learning task where the model is trained on labeled text data to assign predefined categories to new text. Examples include spam detection and topic classification.

Q35.
What is a corpus in NLP?
AA single sentence
BA type of algorithm
CA large collection of text documents
DA grammatical rule
Show Answer & Explanation

Correct Answer: C - A large collection of text documents

A corpus (plural: corpora) is a large, structured collection of text documents used for training and evaluating NLP models. Examples include Wikipedia, news articles, and book collections.

Q36.
Machine Translation refers to:
AConverting code to machine language
BConverting speech to text
CAutomatically translating text from one natural language to another
DTranslating images to text
Show Answer & Explanation

Correct Answer: C - Automatically translating text from one natural language to another

Machine Translation automatically translates text from one natural language to another (e.g., English to French). Modern approaches use neural networks (Neural Machine Translation).

Q37.
Which model architecture revolutionized NLP with self-attention mechanism?
ATransformer
BCNN
CLSTM
DRNN
Show Answer & Explanation

Correct Answer: A - Transformer

The Transformer architecture revolutionized NLP with its self-attention mechanism that processes all positions in a sequence simultaneously. BERT, GPT, and T5 are all based on Transformers.

Q38.
BERT stands for:
ABidirectional Encoder Representations from Transformers
BBinary Encoded Representation Transform
CBasic Entity Recognition Tool
DBatch Enhanced Recurrent Transformer
Show Answer & Explanation

Correct Answer: A - Bidirectional Encoder Representations from Transformers

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model by Google that considers context from both directions (left and right) of a word simultaneously.

Q39.
What is text preprocessing?
AWriting text documents
BStoring text in a database
CPrinting text output
DCleaning and preparing raw text data for analysis
Show Answer & Explanation

Correct Answer: D - Cleaning and preparing raw text data for analysis

Text preprocessing involves cleaning and preparing raw text for NLP tasks. Common steps include tokenization, lowercasing, removing stop words, stemming/lemmatization, and handling special characters.

Q40.
Chunking in NLP is used to:
AGroup words into meaningful phrases based on POS tags
BSplit text into equal-sized pieces
CRemove punctuation from text
DCount word frequencies
Show Answer & Explanation

Correct Answer: A - Group words into meaningful phrases based on POS tags

Chunking (shallow parsing) groups adjacent words into meaningful phrases based on their POS tags. For example, grouping 'the big cat' as a noun phrase (NP). It provides partial syntactic analysis.

Q41.
Which of the following is NOT a common NLP application?
AChatbots
BSpam Filtering
CImage Segmentation
DLanguage Translation
Show Answer & Explanation

Correct Answer: C - Image Segmentation

Image Segmentation is a computer vision task, not an NLP application. Chatbots, spam filtering, and language translation are all common applications of Natural Language Processing.

Q42.
Dependency parsing identifies:
AWord frequencies
BNamed entities
CGrammatical relationships between words in a sentence
DSentiment of text
Show Answer & Explanation

Correct Answer: C - Grammatical relationships between words in a sentence

Dependency parsing analyzes the grammatical structure of a sentence by identifying relationships (dependencies) between words, such as subject-verb, verb-object, and modifier relationships.

Q43.
What is the purpose of word embeddings in NLP?
ATo represent words as numerical vectors capturing semantic meaning
BTo encrypt text data
CTo count words in a document
DTo remove stop words
Show Answer & Explanation

Correct Answer: A - To represent words as numerical vectors capturing semantic meaning

Word embeddings represent words as dense numerical vectors in a continuous vector space where semantically similar words are closer together. This enables mathematical operations on word meanings.

Q44.
Which technique is used to handle out-of-vocabulary (OOV) words?
AStop word removal
BPOS tagging
CStemming
DSubword tokenization
Show Answer & Explanation

Correct Answer: D - Subword tokenization

Subword tokenization (like BPE, WordPiece) handles OOV words by breaking unknown words into smaller subword units that exist in the vocabulary. This is used in models like BERT and GPT.

Q45.
Information Extraction in NLP involves:
ACompressing text files
BAutomatically extracting structured information from unstructured text
CFormatting text documents
DEncrypting sensitive data
Show Answer & Explanation

Correct Answer: B - Automatically extracting structured information from unstructured text

Information Extraction automatically extracts structured data (entities, relationships, events) from unstructured text. It includes tasks like NER, relation extraction, and event extraction.

Q46.
What is the cosine similarity used for in NLP?
AMeasuring text length
BCounting word frequency
CMeasuring similarity between two text vectors
DParsing sentence structure
Show Answer & Explanation

Correct Answer: C - Measuring similarity between two text vectors

Cosine similarity measures the cosine of the angle between two vectors, indicating how similar they are in direction regardless of magnitude. It is widely used to compare document or word vectors in NLP.

Q47.
Which of the following is a sequence-to-sequence NLP task?
ASentiment Analysis
BSpam Detection
CText Classification
DMachine Translation
Show Answer & Explanation

Correct Answer: D - Machine Translation

Machine Translation is a sequence-to-sequence task that takes an input sequence (source language) and produces an output sequence (target language). Text summarization is another seq-to-seq task.

Q48.
One-hot encoding of words results in:
ASparse, high-dimensional vectors
BDense, short vectors
CContinuous, low-dimensional vectors
DBinary classification output
Show Answer & Explanation

Correct Answer: A - Sparse, high-dimensional vectors

One-hot encoding represents each word as a sparse, high-dimensional vector where only one element is 1 and the rest are 0. The vector dimension equals the vocabulary size, making it memory-inefficient.

Q49.
What is the purpose of an attention mechanism in NLP?
ATo reduce vocabulary size
BTo tokenize sentences
CTo remove noise from text
DTo focus on relevant parts of the input when generating output
Show Answer & Explanation

Correct Answer: D - To focus on relevant parts of the input when generating output

The attention mechanism allows the model to focus on the most relevant parts of the input sequence when generating each part of the output. It significantly improves performance on long sequences.

Q50.
Text summarization can be categorized as:
ASupervised and unsupervised
BExtractive and abstractive
CSyntactic and semantic
DShallow and deep
Show Answer & Explanation

Correct Answer: B - Extractive and abstractive

Text summarization is categorized as extractive (selecting important sentences from original text) and abstractive (generating new sentences that convey key information). Abstractive is more challenging.

Showing 1-10 of 50 questions