Back to Practice Artificial Intelligence

Neural Networks - Practice MCQs for CCAT

50 Questions Section B: Programming Artificial Intelligence

Neural Networks Question Bank for C-CAT

Topic-wise Neural Networks MCQs for CDAC C-CAT preparation with answers and explanations.

Q1.
Neural networks are inspired by:
AComputer circuits
BBiological neurons in the brain
CDatabase systems
DNetwork protocols
Show Answer & Explanation

Correct Answer: B - Biological neurons in the brain

Artificial neural networks are inspired by the structure and function of biological neural networks in the brain.

Q2.
A perceptron is:
AMulti-layer network
BSingle layer neural network with one neuron
CDeep network
DRecurrent network
Show Answer & Explanation

Correct Answer: B - Single layer neural network with one neuron

A perceptron is the simplest neural network - a single neuron that makes binary classifications.

Q3.
Activation function introduces:
ALinearity
BNon-linearity to the network
CMore layers
DMore neurons
Show Answer & Explanation

Correct Answer: B - Non-linearity to the network

Activation functions introduce non-linearity, allowing networks to learn complex patterns.

Q4.
ReLU activation function is:
Af(x) = max(0, x)
Bf(x) = sigmoid(x)
Cf(x) = tanh(x)
Df(x) = x2
Show Answer & Explanation

Correct Answer: A - f(x) = max(0, x)

ReLU (Rectified Linear Unit) returns x if positive, otherwise 0. Simple yet effective.

Q5.
Sigmoid function outputs values between:
A-1 and 1
B0 and 1
C-infinity and infinity
D0 and infinity
Show Answer & Explanation

Correct Answer: B - 0 and 1

Sigmoid squashes values to range (0, 1), useful for probability outputs.

Q6.
Backpropagation is used to:
AInitialize weights
BCalculate gradients and update weights
CAdd more layers
DRemove neurons
Show Answer & Explanation

Correct Answer: B - Calculate gradients and update weights

Backpropagation calculates gradients of the loss with respect to weights, enabling weight updates.

Q7.
Gradient descent minimizes:
ANumber of neurons
BLoss/error function
CNumber of layers
DTraining time
Show Answer & Explanation

Correct Answer: B - Loss/error function

Gradient descent iteratively adjusts weights to minimize the loss function.

Q8.
Learning rate in neural networks:
ADetermines number of epochs
BDefines layer count
CSets batch size
DControls step size in weight updates
Show Answer & Explanation

Correct Answer: D - Controls step size in weight updates

Learning rate controls how much weights are adjusted in each update step during training.

Q9.
Epoch in training means:
ASingle weight update
BOne complete pass through entire training dataset
CModel deployment
DTest evaluation
Show Answer & Explanation

Correct Answer: B - One complete pass through entire training dataset

An epoch is one complete pass through the entire training dataset during training.

Q10.
Deep learning refers to:
AShallow networks
BSingle perceptron
CNeural networks with many layers
DLinear regression
Show Answer & Explanation

Correct Answer: C - Neural networks with many layers

Deep learning uses neural networks with multiple hidden layers (deep networks) to learn hierarchical features.

Q11.
Vanishing gradient problem occurs when:
AGradients become very large
BNetwork is too shallow
CGradients become very small in deep networks
DLearning rate is too high
Show Answer & Explanation

Correct Answer: C - Gradients become very small in deep networks

In deep networks, gradients can become extremely small during backpropagation, preventing weight updates in early layers.

Q12.
Dropout is a technique to:
AAdd more neurons
BIncrease model size
CSpeed up training
DPrevent overfitting by randomly dropping neurons
Show Answer & Explanation

Correct Answer: D - Prevent overfitting by randomly dropping neurons

Dropout randomly deactivates neurons during training, preventing overfitting by reducing co-adaptation.

Q13.
Batch normalization:
ANormalizes layer inputs for faster training
BIncreases batch size
CRemoves batches
DOnly used in testing
Show Answer & Explanation

Correct Answer: A - Normalizes layer inputs for faster training

Batch normalization normalizes layer inputs, accelerating training and reducing internal covariate shift.

Q14.
Softmax activation is used in:
AHidden layers only
BInput layer
COutput layer for multi-class classification
DRegularization
Show Answer & Explanation

Correct Answer: C - Output layer for multi-class classification

Softmax converts raw outputs to probability distribution over multiple classes (summing to 1).

Q15.
Loss function measures:
ANumber of layers
BTraining speed
CDifference between predicted and actual values
DMemory usage
Show Answer & Explanation

Correct Answer: C - Difference between predicted and actual values

Loss function quantifies how far model predictions are from actual target values.

Q16.
Adam optimizer combines:
AOnly SGD
BOnly dropout
COnly batch normalization
DMomentum and adaptive learning rates
Show Answer & Explanation

Correct Answer: D - Momentum and adaptive learning rates

Adam combines momentum (from RMSprop) and adaptive learning rates for efficient optimization.

Q17.
Weights in neural networks are typically initialized:
ATo zero
BAll to one
CTo very large values
DRandomly with small values
Show Answer & Explanation

Correct Answer: D - Randomly with small values

Weights are randomly initialized with small values to break symmetry and enable learning.

Q18.
Hidden layers in neural networks:
AAre always visible
BAre between input and output layers
COnly output results
DOnly receive input
Show Answer & Explanation

Correct Answer: B - Are between input and output layers

Hidden layers are intermediate layers between input and output that learn internal representations.

Q19.
Cross-entropy loss is commonly used for:
ARegression tasks
BClassification tasks
CClustering
DDimensionality reduction
Show Answer & Explanation

Correct Answer: B - Classification tasks

Cross-entropy loss measures difference between predicted probability distribution and actual class labels.

Q20.
Transfer learning involves:
AUsing pre-trained model for new task
BTraining from scratch
CDeleting all weights
DOnly testing
Show Answer & Explanation

Correct Answer: A - Using pre-trained model for new task

Transfer learning uses knowledge (weights) from a model trained on one task for a related task.

Q21.
A perceptron is a type of:
AUnsupervised learning algorithm
BDimensionality reduction technique
CClustering algorithm
DSingle-layer neural network
Show Answer & Explanation

Correct Answer: D - Single-layer neural network

A perceptron is the simplest type of neural network consisting of a single layer. It takes inputs, applies weights and a bias, passes through an activation function, and produces an output.

Q22.
Which activation function outputs values between 0 and 1?
AReLU
BTanh
CSigmoid
DLeaky ReLU
Show Answer & Explanation

Correct Answer: C - Sigmoid

The Sigmoid activation function maps any input value to a value between 0 and 1 using the formula σ(x) = 1/(1 + e(-x)). It is commonly used in the output layer for binary classification.

Q23.
Backpropagation is used to:
AInitialize weights randomly
BCalculate gradients and update weights
CPreprocess input data
DSelect the number of layers
Show Answer & Explanation

Correct Answer: B - Calculate gradients and update weights

Backpropagation calculates the gradient of the loss function with respect to each weight by propagating the error backward through the network. These gradients are then used to update weights.

Q24.
What is the vanishing gradient problem?
AGradients become too large during training
BThe model converges too quickly
CGradients become extremely small, slowing learning in early layers
DThe loss function becomes zero
Show Answer & Explanation

Correct Answer: C - Gradients become extremely small, slowing learning in early layers

The vanishing gradient problem occurs when gradients become extremely small as they are propagated back through many layers, causing early layers to learn very slowly or not at all.

Q25.
Which type of neural network is best suited for image recognition tasks?
ARecurrent Neural Network
BAutoencoder
CBoltzmann Machine
DConvolutional Neural Network
Show Answer & Explanation

Correct Answer: D - Convolutional Neural Network

Convolutional Neural Networks (CNNs) are specifically designed for image processing. They use convolutional layers to detect local features like edges, textures, and patterns in images.

Q26.
ReLU activation function is defined as:
Af(x) = max(0, x)
Bf(x) = 1/(1 + e(-x))
Cf(x) = (ex - e(-x))/(ex + e(-x))
Df(x) = x
Show Answer & Explanation

Correct Answer: A - f(x) = max(0, x)

ReLU (Rectified Linear Unit) is defined as f(x) = max(0, x). It outputs the input directly if positive, and zero otherwise. It helps mitigate the vanishing gradient problem.

Q27.
Which layer in a CNN is responsible for reducing spatial dimensions?
AConvolutional Layer
BInput Layer
CFully Connected Layer
DPooling Layer
Show Answer & Explanation

Correct Answer: D - Pooling Layer

The Pooling Layer reduces the spatial dimensions (width and height) of feature maps while retaining important information. Max Pooling and Average Pooling are common types.

Q28.
What is the purpose of dropout in neural networks?
ATo prevent overfitting by randomly dropping neurons
BTo speed up training
CTo increase the number of layers
DTo initialize weights
Show Answer & Explanation

Correct Answer: A - To prevent overfitting by randomly dropping neurons

Dropout is a regularization technique that randomly deactivates a fraction of neurons during training. This prevents neurons from co-adapting and reduces overfitting.

Q29.
Recurrent Neural Networks (RNNs) are designed for:
ASequential data processing
BImage classification
CDimensionality reduction
DData clustering
Show Answer & Explanation

Correct Answer: A - Sequential data processing

RNNs are designed to process sequential data like time series, text, and speech. They have feedback connections that allow information to persist across time steps.

Q30.
Which of the following is the exploding gradient problem?
AGradients become very small
BThe network has too many layers
CGradients grow exponentially large during training
DThe learning rate is too small
Show Answer & Explanation

Correct Answer: C - Gradients grow exponentially large during training

The exploding gradient problem occurs when gradients grow exponentially large during backpropagation, causing unstable weight updates. Gradient clipping is a common solution.

Q31.
In a CNN, what does a convolutional filter (kernel) do?
ARemoves noise from images
BDetects specific features by sliding across the input
CReduces the number of parameters
DNormalizes pixel values
Show Answer & Explanation

Correct Answer: B - Detects specific features by sliding across the input

A convolutional filter slides across the input image, performing element-wise multiplication and summation to detect specific features like edges, corners, and textures.

Q32.
LSTM stands for:
ALinear Sequential Training Model
BLong Short-Term Memory
CLayered Stochastic Transfer Method
DLow Signal Threshold Model
Show Answer & Explanation

Correct Answer: B - Long Short-Term Memory

LSTM (Long Short-Term Memory) is a type of RNN architecture designed to learn long-term dependencies. It uses gates (forget, input, output) to control information flow.

Q33.
Which component of LSTM decides what information to discard?
AInput Gate
BCell State
COutput Gate
DForget Gate
Show Answer & Explanation

Correct Answer: D - Forget Gate

The Forget Gate in LSTM decides what information from the previous cell state should be discarded. It outputs values between 0 (completely forget) and 1 (completely keep).

Q34.
What is the purpose of batch normalization?
ATo normalize inputs to each layer for faster training
BTo reduce the batch size
CTo increase the learning rate
DTo add more training data
Show Answer & Explanation

Correct Answer: A - To normalize inputs to each layer for faster training

Batch normalization normalizes the inputs to each layer by adjusting and scaling activations. It helps stabilize and speed up training, and can act as a mild regularizer.

Q35.
The softmax function is typically used in the output layer for:
ABinary classification
BRegression
CMulti-class classification
DClustering
Show Answer & Explanation

Correct Answer: C - Multi-class classification

Softmax converts a vector of raw scores into probabilities that sum to 1, making it ideal for multi-class classification. Each output represents the probability of belonging to a specific class.

Q36.
What is transfer learning?
AUsing a pre-trained model as a starting point for a new task
BMoving data from one server to another
CTraining multiple models simultaneously
DConverting one algorithm to another
Show Answer & Explanation

Correct Answer: A - Using a pre-trained model as a starting point for a new task

Transfer learning involves taking a model pre-trained on one task (e.g., ImageNet) and fine-tuning it for a different but related task. It saves training time and works well with limited data.

Q37.
In a feedforward neural network, data flows:
AIn both directions
BOnly from output to input
COnly from input to output
DIn a circular manner
Show Answer & Explanation

Correct Answer: C - Only from input to output

In a feedforward neural network, data flows in one direction only — from the input layer through hidden layers to the output layer. There are no feedback loops or cycles.

Q38.
Which of the following is NOT an activation function?
ASigmoid
BReLU
CTanh
DGradient Descent
Show Answer & Explanation

Correct Answer: D - Gradient Descent

Gradient Descent is an optimization algorithm used to minimize the loss function, not an activation function. Sigmoid, ReLU, and Tanh are all activation functions used in neural networks.

Q39.
A GAN (Generative Adversarial Network) consists of which two networks?
AEncoder and Decoder
BForward and Backward
CConvolutional and Recurrent
DGenerator and Discriminator
Show Answer & Explanation

Correct Answer: D - Generator and Discriminator

A GAN consists of a Generator (creates fake data) and a Discriminator (distinguishes real from fake data). They are trained adversarially, with each network trying to outsmart the other.

Q40.
The Tanh activation function outputs values in which range?
A-1 to 1
B0 to 1
C0 to infinity
D-infinity to infinity
Show Answer & Explanation

Correct Answer: A - -1 to 1

The Tanh (hyperbolic tangent) function outputs values between -1 and 1. It is zero-centered, which can make optimization easier compared to the sigmoid function.

Q41.
What is the role of padding in CNN?
ATo increase training speed
BTo increase the number of filters
CTo reduce overfitting
DTo preserve spatial dimensions after convolution
Show Answer & Explanation

Correct Answer: D - To preserve spatial dimensions after convolution

Padding adds extra pixels (usually zeros) around the input image borders. This preserves spatial dimensions after convolution and ensures edge pixels are processed adequately.

Q42.
Which optimizer adapts the learning rate for each parameter individually?
AAdam
BSGD
CBatch Gradient Descent
DMini-batch Gradient Descent
Show Answer & Explanation

Correct Answer: A - Adam

Adam (Adaptive Moment Estimation) adapts the learning rate for each parameter based on first and second moments of the gradients. It combines the benefits of AdaGrad and RMSProp.

Q43.
What is the stride in a convolutional layer?
AThe size of the filter
BThe depth of the output
CThe number of pixels the filter moves at each step
DThe number of filters used
Show Answer & Explanation

Correct Answer: C - The number of pixels the filter moves at each step

Stride refers to the number of pixels the convolutional filter moves at each step. A stride of 1 moves one pixel at a time, while a stride of 2 skips every other pixel, reducing output size.

Q44.
An autoencoder consists of which two main parts?
AEncoder and Decoder
BGenerator and Discriminator
CInput and Output
DTraining and Testing
Show Answer & Explanation

Correct Answer: A - Encoder and Decoder

An autoencoder has an Encoder (compresses input into a lower-dimensional representation) and a Decoder (reconstructs the input from the compressed representation). It is used for dimensionality reduction and feature learning.

Q45.
What problem does the GRU (Gated Recurrent Unit) address?
AOverfitting in CNNs
BSlow training in feedforward networks
CVanishing gradient in RNNs
DFeature extraction in images
Show Answer & Explanation

Correct Answer: C - Vanishing gradient in RNNs

GRU (Gated Recurrent Unit) addresses the vanishing gradient problem in standard RNNs. It uses update and reset gates to control information flow, similar to LSTM but with fewer parameters.

Q46.
Which type of pooling selects the maximum value from each region?
AAverage Pooling
BMin Pooling
CMax Pooling
DGlobal Pooling
Show Answer & Explanation

Correct Answer: C - Max Pooling

Max Pooling selects the maximum value from each region of the feature map. It helps retain the most prominent features while reducing spatial dimensions and computational cost.

Q47.
What is a weight in a neural network?
AThe input data value
BA learnable parameter that determines the strength of connection between neurons
CThe output of the network
DThe number of layers
Show Answer & Explanation

Correct Answer: B - A learnable parameter that determines the strength of connection between neurons

A weight is a learnable parameter that determines the strength and direction of the connection between two neurons. During training, weights are adjusted to minimize the loss function.

Q48.
Leaky ReLU differs from ReLU by:
AAllowing a small gradient for negative inputs
BHaving a maximum output value
COutputting values between 0 and 1
DBeing used only in CNNs
Show Answer & Explanation

Correct Answer: A - Allowing a small gradient for negative inputs

Leaky ReLU allows a small, non-zero gradient for negative inputs (typically 0.01x for x < 0) instead of zero. This helps prevent the 'dying ReLU' problem where neurons stop learning.

Q49.
What is the purpose of the bias term in a neural network?
ATo multiply with the input
BTo normalize the input
CTo reduce the number of parameters
DTo shift the activation function to better fit the data
Show Answer & Explanation

Correct Answer: D - To shift the activation function to better fit the data

The bias term allows the activation function to be shifted left or right, enabling the model to better fit the data. Without bias, the model would be forced to pass through the origin.

Q50.
Which architecture introduced the attention mechanism for sequence-to-sequence tasks?
ATransformer
BVGGNet
CAlexNet
DResNet
Show Answer & Explanation

Correct Answer: A - Transformer

The Transformer architecture, introduced in the paper 'Attention Is All You Need' (2017), uses self-attention mechanisms to process sequences in parallel, replacing the need for recurrence.

Showing 1-10 of 50 questions