Machine Learning Basics MCQs for CCAT with Answers

Q1.

Machine Learning is a subset of:

ADatabase Management

BOperating Systems

CNetwork Security

DArtificial Intelligence

Show Answer & Explanation

Correct Answer: D - Artificial Intelligence

Machine Learning is a branch of AI that enables systems to learn from data without explicit programming.

Q2.

Supervised learning requires:

AUnlabeled data only

BLabeled training data

CNo data

DOnly test data

Show Answer & Explanation

Correct Answer: B - Labeled training data

Supervised learning uses labeled data where both input features and target outputs are provided.

Q3.

Which is an example of unsupervised learning?

AEmail spam classification

BHouse price prediction

CCustomer segmentation/clustering

DImage classification with labels

Show Answer & Explanation

Correct Answer: C - Customer segmentation/clustering

Clustering groups similar data points without predefined labels - a classic unsupervised learning task.

Q4.

Reinforcement learning involves:

AOnly labeled data

BAgent learning through rewards and penalties

COnly clustering

DNo learning

Show Answer & Explanation

Correct Answer: B - Agent learning through rewards and penalties

Reinforcement learning trains agents to make decisions by receiving rewards or penalties for actions.

Q5.

Classification in ML predicts:

ADiscrete categories/classes

BContinuous values

COnly numbers

DOnly text

Show Answer & Explanation

Correct Answer: A - Discrete categories/classes

Classification predicts discrete categorical labels like spam/not spam, cat/dog, etc.

Q6.

Regression in ML predicts:

ACategories

BOnly integers

COnly binary outcomes

DContinuous numerical values

Show Answer & Explanation

Correct Answer: D - Continuous numerical values

Regression predicts continuous values like prices, temperatures, or sales figures.

Q7.

Overfitting occurs when:

AModel performs poorly on training data

BDataset is too large

CModel is too simple

DModel memorizes training data but fails on new data

Show Answer & Explanation

Correct Answer: D - Model memorizes training data but fails on new data

Overfitting happens when a model learns noise in training data, performing well on training but poorly on unseen data.

Q8.

Underfitting indicates:

AModel is too complex

BPerfect fit

CModel is too simple to capture patterns

DToo much training

Show Answer & Explanation

Correct Answer: C - Model is too simple to capture patterns

Underfitting occurs when the model is too simple to capture the underlying patterns in the data.

Q9.

Cross-validation is used for:

AEvaluating model performance on different data splits

BData collection

CData storage

DNetwork testing

Show Answer & Explanation

Correct Answer: A - Evaluating model performance on different data splits

Cross-validation evaluates model performance by training and testing on different subsets of data.

Q10.

Feature in machine learning refers to:

AOutput variable

BInput variable used for prediction

CDatabase table

DNetwork node

Show Answer & Explanation

Correct Answer: B - Input variable used for prediction

Features are input variables (attributes) that the model uses to make predictions.

Q11.

Training set is used to:

AEvaluate final model

BTrain/fit the model parameters

CDeploy the model

DStore predictions

Show Answer & Explanation

Correct Answer: B - Train/fit the model parameters

Training set is used to train the model - the model learns patterns from this data.

Q12.

Test set is used to:

ATrain the model

BEvaluate model on unseen data

CStore training data

DClean data

Show Answer & Explanation

Correct Answer: B - Evaluate model on unseen data

Test set evaluates how well the trained model generalizes to new, unseen data.

Q13.

Decision tree splits data based on:

ARandom selection

BFile size

CAlphabetical order

DFeature values to maximize information gain

Show Answer & Explanation

Correct Answer: D - Feature values to maximize information gain

Decision trees split data on feature values that best separate classes (maximize information gain/minimize impurity).

Q14.

K-Nearest Neighbors (KNN) classifies based on:

AMajority vote of k nearest neighbors

BSingle nearest point

CAll data points

DRandom selection

Show Answer & Explanation

Correct Answer: A - Majority vote of k nearest neighbors

KNN classifies a point based on the majority class among its k nearest neighbors in feature space.

Q15.

Linear regression finds:

AClusters in data

BAnomalies

CClassification boundaries

DBest-fit line to predict continuous values

Show Answer & Explanation

Correct Answer: D - Best-fit line to predict continuous values

Linear regression fits a line (or hyperplane) that minimizes the error between predictions and actual values.

Q16.

Logistic regression is used for:

ARegression only

BDimensionality reduction

CClustering

DBinary classification

Show Answer & Explanation

Correct Answer: D - Binary classification

Despite its name, logistic regression is used for binary classification, predicting probabilities.

Q17.

Accuracy in classification measures:

AOnly false positives

BOnly false negatives

CPercentage of correct predictions

DTraining time

Show Answer & Explanation

Correct Answer: C - Percentage of correct predictions

Accuracy is the ratio of correct predictions (both positive and negative) to total predictions.

Q18.

Precision measures:

AAll correct predictions

BSpeed of prediction

CProportion of true positives among predicted positives

DData size

Show Answer & Explanation

Correct Answer: C - Proportion of true positives among predicted positives

Precision = True Positives / (True Positives + False Positives) - how many predicted positives are actually positive.

Q19.

Recall (Sensitivity) measures:

AFalse positive rate

BProportion of actual positives correctly identified

CPrediction speed

DModel complexity

Show Answer & Explanation

Correct Answer: B - Proportion of actual positives correctly identified

Recall = True Positives / (True Positives + False Negatives) - how many actual positives were correctly identified.

Q20.

Bias-variance tradeoff means:

AReducing bias increases variance and vice versa

BMore data is always better

CSimpler models are always better

DComplex models never overfit

Show Answer & Explanation

Correct Answer: A - Reducing bias increases variance and vice versa

Decreasing bias (making model more complex) typically increases variance (sensitivity to training data), and vice versa.

Q21.

Which type of machine learning uses labeled data for training?

AUnsupervised Learning

BReinforcement Learning

CSupervised Learning

DSemi-supervised Learning

Show Answer & Explanation

Correct Answer: C - Supervised Learning

Supervised Learning uses labeled training data (input-output pairs) to learn a mapping function. The model is trained on known correct answers and then makes predictions on new, unseen data.

Q22.

K-Means is an example of which type of learning?

AUnsupervised Learning

BSupervised Learning

CReinforcement Learning

DTransfer Learning

Show Answer & Explanation

Correct Answer: A - Unsupervised Learning

K-Means is an unsupervised learning algorithm used for clustering. It partitions data into K clusters based on similarity, without requiring labeled training data.

Q23.

Overfitting in machine learning occurs when:

AThe model performs well on both training and test data

BThe model has too few parameters

CThe model performs poorly on both training and test data

DThe model performs well on training data but poorly on test data

Show Answer & Explanation

Correct Answer: D - The model performs well on training data but poorly on test data

Overfitting occurs when a model learns the training data too well, including noise and outliers, resulting in excellent training accuracy but poor generalization to new, unseen data.

Q24.

Which algorithm is commonly used for classification tasks?

ALinear Regression

BK-Means Clustering

CPrincipal Component Analysis

DDecision Tree

Show Answer & Explanation

Correct Answer: D - Decision Tree

Decision Trees are commonly used for classification tasks. They split data into branches based on feature values and make predictions at leaf nodes. They can also be used for regression.

Q25.

What is the purpose of cross-validation in machine learning?

ATo increase training speed

BTo select the best programming language

CTo reduce the dataset size

DTo evaluate model performance and reduce overfitting

Show Answer & Explanation

Correct Answer: D - To evaluate model performance and reduce overfitting

Cross-validation is a technique to evaluate model performance by splitting data into multiple folds, training on some folds and testing on others. It helps detect overfitting and provides a more reliable performance estimate.

Q26.

Linear Regression is used to predict:

ACategorical values

BBoolean values

CContinuous values

DComplex numbers

Show Answer & Explanation

Correct Answer: C - Continuous values

Linear Regression predicts continuous numerical values by finding the best-fit linear relationship between input features and the output variable. For example, predicting house prices based on area.

Q27.

Which metric is NOT commonly used for classification evaluation?

AAccuracy

BPrecision

CMean Squared Error

DF1-Score

Show Answer & Explanation

Correct Answer: C - Mean Squared Error

Mean Squared Error (MSE) is a regression metric, not a classification metric. Classification tasks typically use Accuracy, Precision, Recall, and F1-Score for evaluation.

Q28.

What does the 'K' in K-Nearest Neighbors represent?

ANumber of features

BNumber of clusters

CNumber of nearest neighbors to consider

DNumber of iterations

Show Answer & Explanation

Correct Answer: C - Number of nearest neighbors to consider

In KNN, 'K' represents the number of nearest neighbors to consider when making a classification or regression prediction. A new data point is classified based on the majority class of its K nearest neighbors.

Q29.

Which technique is used to prevent overfitting by adding a penalty term to the loss function?

ANormalization

BRegularization

CStandardization

DAugmentation

Show Answer & Explanation

Correct Answer: B - Regularization

Regularization prevents overfitting by adding a penalty term to the loss function. L1 (Lasso) and L2 (Ridge) regularization are common techniques that discourage overly complex models.

Q30.

In a confusion matrix, what does a 'True Positive' represent?

APredicted positive, actually positive

BPredicted positive, actually negative

CPredicted negative, actually positive

DPredicted negative, actually negative

Show Answer & Explanation

Correct Answer: A - Predicted positive, actually positive

A True Positive (TP) occurs when the model correctly predicts the positive class. The actual label is positive, and the model's prediction is also positive.

Q31.

Which of the following is an ensemble learning method?

ALinear Regression

BK-Means

CRandom Forest

DPCA

Show Answer & Explanation

Correct Answer: C - Random Forest

Random Forest is an ensemble learning method that builds multiple decision trees and merges their predictions. It reduces overfitting and improves accuracy compared to a single decision tree.

Q32.

What is the bias-variance tradeoff?

ATradeoff between underfitting and overfitting

BTradeoff between model speed and accuracy

CTradeoff between training and testing

DTradeoff between precision and recall

Show Answer & Explanation

Correct Answer: A - Tradeoff between underfitting and overfitting

The bias-variance tradeoff is the balance between underfitting (high bias, low variance) and overfitting (low bias, high variance). A good model balances both to minimize total error.

Q33.

Which algorithm uses the concept of 'margin' to find the optimal hyperplane?

ADecision Tree

BK-Means

CNaive Bayes

DSupport Vector Machine

Show Answer & Explanation

Correct Answer: D - Support Vector Machine

Support Vector Machine (SVM) finds the optimal hyperplane that maximizes the margin between two classes. The data points closest to the hyperplane are called support vectors.

Q34.

Naive Bayes classifier is based on which theorem?

APythagorean Theorem

BBayes' Theorem

CCentral Limit Theorem

DFermat's Last Theorem

Show Answer & Explanation

Correct Answer: B - Bayes' Theorem

Naive Bayes is based on Bayes' Theorem with the 'naive' assumption that features are conditionally independent given the class. It calculates posterior probability for classification.

Q35.

What is the purpose of feature scaling in machine learning?

ATo increase the number of features

BTo bring all features to the same scale

CTo remove irrelevant features

DTo add noise to the data

Show Answer & Explanation

Correct Answer: B - To bring all features to the same scale

Feature scaling normalizes the range of features so that no single feature dominates the model due to its scale. Common methods include Min-Max scaling and Standardization (z-score).

Q36.

Which of the following is a dimensionality reduction technique?

ARandom Forest

BBackpropagation

CGradient Descent

DPrincipal Component Analysis

Show Answer & Explanation

Correct Answer: D - Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving maximum variance.

Q37.

In gradient descent, the learning rate determines:

AThe direction of the gradient

BThe size of each step toward the minimum

CThe number of features

DThe number of training examples

Show Answer & Explanation

Correct Answer: B - The size of each step toward the minimum

The learning rate controls the size of the steps taken during gradient descent. A too-large learning rate may overshoot the minimum, while a too-small rate results in slow convergence.

Q38.

What is the difference between classification and regression?

AClassification predicts continuous values; regression predicts categories

BBoth predict continuous values

CClassification predicts categories; regression predicts continuous values

DBoth predict categories

Show Answer & Explanation

Correct Answer: C - Classification predicts categories; regression predicts continuous values

Classification predicts discrete categorical labels (e.g., spam/not spam), while regression predicts continuous numerical values (e.g., house price). They are both supervised learning tasks.

Q39.

Which of the following causes underfitting?

AToo complex model

BToo many features

CToo much training data

DToo simple model

Show Answer & Explanation

Correct Answer: D - Too simple model

Underfitting occurs when the model is too simple to capture the underlying patterns in the data. It results in high bias and poor performance on both training and test data.

Q40.

What does the R-squared (R²) score measure?

AProportion of variance explained by the model

BClassification accuracy

CNumber of outliers in data

DDistance between clusters

Show Answer & Explanation

Correct Answer: A - Proportion of variance explained by the model

R-squared measures the proportion of variance in the dependent variable that is explained by the independent variables. A value of 1 indicates a perfect fit, while 0 indicates no predictive power.

Q41.

Logistic Regression is used for:

ARegression problems only

BClustering problems

CClassification problems

DDimensionality reduction

Show Answer & Explanation

Correct Answer: C - Classification problems

Despite its name, Logistic Regression is used for binary classification. It uses the sigmoid function to output probabilities and classifies data points into one of two categories.

Q42.

Which method splits data into training and testing sets only once?

AHold-out Validation

BK-Fold Cross Validation

CLeave-One-Out Cross Validation

DStratified Sampling

Show Answer & Explanation

Correct Answer: A - Hold-out Validation

Hold-out validation splits the dataset into training and testing sets only once (typically 70-30 or 80-20). It is simpler but may give biased results compared to cross-validation.

Q43.

Bagging in ensemble learning stands for:

ABackward Aggregating

BBatch Aggregating

CBootstrap Aggregating

DBinary Aggregating

Show Answer & Explanation

Correct Answer: C - Bootstrap Aggregating

Bagging stands for Bootstrap Aggregating. It creates multiple subsets of the training data using bootstrapping (sampling with replacement), trains a model on each, and combines predictions.

Q44.

What is the curse of dimensionality?

AData becomes sparse in high-dimensional space

BThe model has too few parameters

CTraining takes too long

DThe dataset is too small

Show Answer & Explanation

Correct Answer: A - Data becomes sparse in high-dimensional space

The curse of dimensionality refers to the phenomenon where data becomes increasingly sparse as the number of features (dimensions) increases, making it harder for algorithms to find patterns.

Q45.

Which loss function is commonly used for binary classification?

AMean Squared Error

BBinary Cross-Entropy

CMean Absolute Error

DHinge Loss

Show Answer & Explanation

Correct Answer: B - Binary Cross-Entropy

Binary Cross-Entropy (Log Loss) is the standard loss function for binary classification. It measures the difference between predicted probabilities and actual binary labels.

Q46.

In k-fold cross-validation, if k = 5, how many times is the model trained?

A5

B1

C10

D25

Show Answer & Explanation

Correct Answer: A - 5

In 5-fold cross-validation, the data is split into 5 folds. The model is trained 5 times, each time using 4 folds for training and 1 fold for testing. The results are then averaged.

Q47.

What does the term 'epoch' mean in machine learning training?

AA single training example

BOne complete pass through the entire training dataset

CA single weight update

DThe learning rate value

Show Answer & Explanation

Correct Answer: B - One complete pass through the entire training dataset

An epoch refers to one complete pass through the entire training dataset during model training. Multiple epochs are typically needed for the model to converge to optimal parameters.

Q48.

Which of the following is a type of unsupervised learning?

AClustering

BRegression

CClassification

DObject Detection

Show Answer & Explanation

Correct Answer: A - Clustering

Clustering is a type of unsupervised learning that groups similar data points together without using labeled data. K-Means, DBSCAN, and Hierarchical clustering are common methods.

Q49.

Precision in a classification model is defined as:

ATP / (TP + FN)

BTP / (TP + FP)

CTN / (TN + FP)

D(TP + TN) / Total

Show Answer & Explanation

Correct Answer: B - TP / (TP + FP)

Precision = TP / (TP + FP). It measures the proportion of predicted positives that are actually positive. High precision means fewer false positives.

Q50.

What is the main advantage of Gradient Boosting over Bagging?

AIt builds models sequentially, correcting errors of previous models

BIt is faster

CIt uses fewer resources

DIt requires no hyperparameter tuning

Show Answer & Explanation

Correct Answer: A - It builds models sequentially, correcting errors of previous models

Gradient Boosting builds models sequentially, where each new model focuses on correcting the errors of the previous ensemble. This often leads to better accuracy than bagging methods.

Machine Learning Basics - Practice MCQs for CCAT

Machine Learning Basics Question Bank for C-CAT

Machine Learning Basics - Practice MCQs for CCAT

Machine Learning Basics Question Bank for C-CAT

More Artificial Intelligence Topics