Neural Networks Question Bank for C-CAT
Topic-wise Neural Networks MCQs for CDAC C-CAT preparation with answers and explanations.
Show Answer & Explanation
Correct Answer: B - Biological neurons in the brain
Artificial neural networks are inspired by the structure and function of biological neural networks in the brain.
Show Answer & Explanation
Correct Answer: B - Single layer neural network with one neuron
A perceptron is the simplest neural network - a single neuron that makes binary classifications.
Show Answer & Explanation
Correct Answer: B - Non-linearity to the network
Activation functions introduce non-linearity, allowing networks to learn complex patterns.
Show Answer & Explanation
Correct Answer: A - f(x) = max(0, x)
ReLU (Rectified Linear Unit) returns x if positive, otherwise 0. Simple yet effective.
Show Answer & Explanation
Correct Answer: B - 0 and 1
Sigmoid squashes values to range (0, 1), useful for probability outputs.
Show Answer & Explanation
Correct Answer: B - Calculate gradients and update weights
Backpropagation calculates gradients of the loss with respect to weights, enabling weight updates.
Show Answer & Explanation
Correct Answer: B - Loss/error function
Gradient descent iteratively adjusts weights to minimize the loss function.
Show Answer & Explanation
Correct Answer: D - Controls step size in weight updates
Learning rate controls how much weights are adjusted in each update step during training.
Show Answer & Explanation
Correct Answer: B - One complete pass through entire training dataset
An epoch is one complete pass through the entire training dataset during training.
Show Answer & Explanation
Correct Answer: C - Neural networks with many layers
Deep learning uses neural networks with multiple hidden layers (deep networks) to learn hierarchical features.
Show Answer & Explanation
Correct Answer: C - Gradients become very small in deep networks
In deep networks, gradients can become extremely small during backpropagation, preventing weight updates in early layers.
Show Answer & Explanation
Correct Answer: D - Prevent overfitting by randomly dropping neurons
Dropout randomly deactivates neurons during training, preventing overfitting by reducing co-adaptation.
Show Answer & Explanation
Correct Answer: A - Normalizes layer inputs for faster training
Batch normalization normalizes layer inputs, accelerating training and reducing internal covariate shift.
Show Answer & Explanation
Correct Answer: C - Output layer for multi-class classification
Softmax converts raw outputs to probability distribution over multiple classes (summing to 1).
Show Answer & Explanation
Correct Answer: C - Difference between predicted and actual values
Loss function quantifies how far model predictions are from actual target values.
Show Answer & Explanation
Correct Answer: D - Momentum and adaptive learning rates
Adam combines momentum (from RMSprop) and adaptive learning rates for efficient optimization.
Show Answer & Explanation
Correct Answer: D - Randomly with small values
Weights are randomly initialized with small values to break symmetry and enable learning.
Show Answer & Explanation
Correct Answer: B - Are between input and output layers
Hidden layers are intermediate layers between input and output that learn internal representations.
Show Answer & Explanation
Correct Answer: B - Classification tasks
Cross-entropy loss measures difference between predicted probability distribution and actual class labels.
Show Answer & Explanation
Correct Answer: A - Using pre-trained model for new task
Transfer learning uses knowledge (weights) from a model trained on one task for a related task.
Show Answer & Explanation
Correct Answer: D - Single-layer neural network
A perceptron is the simplest type of neural network consisting of a single layer. It takes inputs, applies weights and a bias, passes through an activation function, and produces an output.
Show Answer & Explanation
Correct Answer: C - Sigmoid
The Sigmoid activation function maps any input value to a value between 0 and 1 using the formula σ(x) = 1/(1 + e(-x)). It is commonly used in the output layer for binary classification.
Show Answer & Explanation
Correct Answer: B - Calculate gradients and update weights
Backpropagation calculates the gradient of the loss function with respect to each weight by propagating the error backward through the network. These gradients are then used to update weights.
Show Answer & Explanation
Correct Answer: C - Gradients become extremely small, slowing learning in early layers
The vanishing gradient problem occurs when gradients become extremely small as they are propagated back through many layers, causing early layers to learn very slowly or not at all.
Show Answer & Explanation
Correct Answer: D - Convolutional Neural Network
Convolutional Neural Networks (CNNs) are specifically designed for image processing. They use convolutional layers to detect local features like edges, textures, and patterns in images.
Show Answer & Explanation
Correct Answer: A - f(x) = max(0, x)
ReLU (Rectified Linear Unit) is defined as f(x) = max(0, x). It outputs the input directly if positive, and zero otherwise. It helps mitigate the vanishing gradient problem.
Show Answer & Explanation
Correct Answer: D - Pooling Layer
The Pooling Layer reduces the spatial dimensions (width and height) of feature maps while retaining important information. Max Pooling and Average Pooling are common types.
Show Answer & Explanation
Correct Answer: A - To prevent overfitting by randomly dropping neurons
Dropout is a regularization technique that randomly deactivates a fraction of neurons during training. This prevents neurons from co-adapting and reduces overfitting.
Show Answer & Explanation
Correct Answer: A - Sequential data processing
RNNs are designed to process sequential data like time series, text, and speech. They have feedback connections that allow information to persist across time steps.
Show Answer & Explanation
Correct Answer: C - Gradients grow exponentially large during training
The exploding gradient problem occurs when gradients grow exponentially large during backpropagation, causing unstable weight updates. Gradient clipping is a common solution.
Show Answer & Explanation
Correct Answer: B - Detects specific features by sliding across the input
A convolutional filter slides across the input image, performing element-wise multiplication and summation to detect specific features like edges, corners, and textures.
Show Answer & Explanation
Correct Answer: B - Long Short-Term Memory
LSTM (Long Short-Term Memory) is a type of RNN architecture designed to learn long-term dependencies. It uses gates (forget, input, output) to control information flow.
Show Answer & Explanation
Correct Answer: D - Forget Gate
The Forget Gate in LSTM decides what information from the previous cell state should be discarded. It outputs values between 0 (completely forget) and 1 (completely keep).
Show Answer & Explanation
Correct Answer: A - To normalize inputs to each layer for faster training
Batch normalization normalizes the inputs to each layer by adjusting and scaling activations. It helps stabilize and speed up training, and can act as a mild regularizer.
Show Answer & Explanation
Correct Answer: C - Multi-class classification
Softmax converts a vector of raw scores into probabilities that sum to 1, making it ideal for multi-class classification. Each output represents the probability of belonging to a specific class.
Show Answer & Explanation
Correct Answer: A - Using a pre-trained model as a starting point for a new task
Transfer learning involves taking a model pre-trained on one task (e.g., ImageNet) and fine-tuning it for a different but related task. It saves training time and works well with limited data.
Show Answer & Explanation
Correct Answer: C - Only from input to output
In a feedforward neural network, data flows in one direction only — from the input layer through hidden layers to the output layer. There are no feedback loops or cycles.
Show Answer & Explanation
Correct Answer: D - Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the loss function, not an activation function. Sigmoid, ReLU, and Tanh are all activation functions used in neural networks.
Show Answer & Explanation
Correct Answer: D - Generator and Discriminator
A GAN consists of a Generator (creates fake data) and a Discriminator (distinguishes real from fake data). They are trained adversarially, with each network trying to outsmart the other.
Show Answer & Explanation
Correct Answer: A - -1 to 1
The Tanh (hyperbolic tangent) function outputs values between -1 and 1. It is zero-centered, which can make optimization easier compared to the sigmoid function.
Show Answer & Explanation
Correct Answer: D - To preserve spatial dimensions after convolution
Padding adds extra pixels (usually zeros) around the input image borders. This preserves spatial dimensions after convolution and ensures edge pixels are processed adequately.
Show Answer & Explanation
Correct Answer: A - Adam
Adam (Adaptive Moment Estimation) adapts the learning rate for each parameter based on first and second moments of the gradients. It combines the benefits of AdaGrad and RMSProp.
Show Answer & Explanation
Correct Answer: C - The number of pixels the filter moves at each step
Stride refers to the number of pixels the convolutional filter moves at each step. A stride of 1 moves one pixel at a time, while a stride of 2 skips every other pixel, reducing output size.
Show Answer & Explanation
Correct Answer: A - Encoder and Decoder
An autoencoder has an Encoder (compresses input into a lower-dimensional representation) and a Decoder (reconstructs the input from the compressed representation). It is used for dimensionality reduction and feature learning.
Show Answer & Explanation
Correct Answer: C - Vanishing gradient in RNNs
GRU (Gated Recurrent Unit) addresses the vanishing gradient problem in standard RNNs. It uses update and reset gates to control information flow, similar to LSTM but with fewer parameters.
Show Answer & Explanation
Correct Answer: C - Max Pooling
Max Pooling selects the maximum value from each region of the feature map. It helps retain the most prominent features while reducing spatial dimensions and computational cost.
Show Answer & Explanation
Correct Answer: B - A learnable parameter that determines the strength of connection between neurons
A weight is a learnable parameter that determines the strength and direction of the connection between two neurons. During training, weights are adjusted to minimize the loss function.
Show Answer & Explanation
Correct Answer: A - Allowing a small gradient for negative inputs
Leaky ReLU allows a small, non-zero gradient for negative inputs (typically 0.01x for x < 0) instead of zero. This helps prevent the 'dying ReLU' problem where neurons stop learning.
Show Answer & Explanation
Correct Answer: D - To shift the activation function to better fit the data
The bias term allows the activation function to be shifted left or right, enabling the model to better fit the data. Without bias, the model would be forced to pass through the origin.
Show Answer & Explanation
Correct Answer: A - Transformer
The Transformer architecture, introduced in the paper 'Attention Is All You Need' (2017), uses self-attention mechanisms to process sequences in parallel, replacing the need for recurrence.