Big Data Fundamentals MCQs for CCAT | 20 Practice Questions with Answers

Q: Which of the following is NOT one of the 5 V's of Big Data?

The correct answer is D: Validation. The 5 V's of Big Data are Volume, Velocity, Variety, Veracity, and Value. Validation is not one of them.

Q: Big Data refers to datasets that are:

The correct answer is C: Too large for traditional database systems. Big Data refers to datasets that are too large, complex, or fast-moving for traditional database systems to handle.

Q: Which characteristic of Big Data refers to the speed at which data is generated?

The correct answer is B: Velocity. Velocity refers to the speed at which data is generated, collected, and processed.

Q: Structured, semi-structured, and unstructured data represent which Big Data characteristic?

The correct answer is C: Variety. Variety refers to the different types of data formats - structured (tables), semi-structured (JSON, XML), and unstructured (text, images).

Q: Which technology is primarily used for distributed storage in Big Data?

The correct answer is B: HDFS. HDFS (Hadoop Distributed File System) is designed for distributed storage across clusters of commodity hardware.

Q1.

Which of the following is NOT one of the 5 V's of Big Data?

AVolume

BVelocity

CVariety

DValidation

Show Answer & Explanation

Correct Answer: D — Validation

The 5 V's of Big Data are Volume, Velocity, Variety, Veracity, and Value. Validation is not one of them.

Q2.

Big Data refers to datasets that are:

AOnly structured data

BSmall enough to process on a single machine

CToo large for traditional database systems

DOnly unstructured data

Show Answer & Explanation

Correct Answer: C — Too large for traditional database systems

Big Data refers to datasets that are too large, complex, or fast-moving for traditional database systems to handle.

Q3.

Which characteristic of Big Data refers to the speed at which data is generated?

AVolume

BVelocity

CVariety

DVeracity

Show Answer & Explanation

Correct Answer: B — Velocity

Velocity refers to the speed at which data is generated, collected, and processed.

Q4.

Structured, semi-structured, and unstructured data represent which Big Data characteristic?

AVolume

BVelocity

CVariety

DValue

Show Answer & Explanation

Correct Answer: C — Variety

Variety refers to the different types of data formats - structured (tables), semi-structured (JSON, XML), and unstructured (text, images).

Q5.

Which technology is primarily used for distributed storage in Big Data?

AMySQL

BHDFS

CSQLite

DPostgreSQL

Show Answer & Explanation

Correct Answer: B — HDFS

HDFS (Hadoop Distributed File System) is designed for distributed storage across clusters of commodity hardware.

Q6.

Data Lake is different from Data Warehouse because:

AData Lake only stores structured data

BData Lake stores raw data in native format

CData Warehouse is cheaper

DData Lake requires schema before loading

Show Answer & Explanation

Correct Answer: B — Data Lake stores raw data in native format

Data Lake stores raw data in its native format (schema-on-read), while Data Warehouse requires structured data with predefined schema (schema-on-write).

Q7.

ETL in Big Data stands for:

AExtract, Transform, Load

BEncrypt, Transfer, Log

CExecute, Test, Launch

DExport, Translate, Link

Show Answer & Explanation

Correct Answer: A — Extract, Transform, Load

ETL stands for Extract, Transform, Load - the process of extracting data from sources, transforming it, and loading into a destination system.

Q8.

Which is an example of real-time Big Data processing?

AMonthly sales reports

BCredit card fraud detection

CAnnual inventory audit

DQuarterly financial statements

Show Answer & Explanation

Correct Answer: B — Credit card fraud detection

Credit card fraud detection requires real-time processing to identify suspicious transactions as they occur.

Q9.

Veracity in Big Data refers to:

ASpeed of data

BSize of data

CTrustworthiness of data

DType of data

Show Answer & Explanation

Correct Answer: C — Trustworthiness of data

Veracity refers to the quality, accuracy, and trustworthiness of the data.

Q10.

Which is NOT a common Big Data use case?

APredictive maintenance

BCustomer sentiment analysis

CSingle-user desktop application

DRecommendation engines

Show Answer & Explanation

Correct Answer: C — Single-user desktop application

Single-user desktop applications don't require Big Data technologies. The others involve processing large datasets.

Q11.

Batch processing in Big Data is characterized by:

AReal-time responses

BProcessing large volumes of data at scheduled intervals

CImmediate data processing

DStream processing

Show Answer & Explanation

Correct Answer: B — Processing large volumes of data at scheduled intervals

Batch processing involves processing large volumes of accumulated data at scheduled intervals, not in real-time.

Q12.

Which company originally developed Hadoop?

AGoogle

BYahoo

CFacebook

DAmazon

Show Answer & Explanation

Correct Answer: B — Yahoo

Hadoop was originally developed at Yahoo based on Google's published papers on MapReduce and GFS.

Q13.

The CAP theorem states that a distributed system can have at most:

AOne property

BTwo of three properties simultaneously

CAll three properties

DNone of the properties

Show Answer & Explanation

Correct Answer: B — Two of three properties simultaneously

CAP theorem states that a distributed system can only guarantee 2 of 3: Consistency, Availability, and Partition tolerance.

Q14.

Which is a stream processing framework?

AHadoop MapReduce

BApache Kafka Streams

CMySQL

DOracle DB

Show Answer & Explanation

Correct Answer: B — Apache Kafka Streams

Apache Kafka Streams is designed for real-time stream processing. MapReduce is for batch processing.

Q15.

Data sharding is used for:

AData encryption

BDistributing data across multiple databases

CData compression

DData backup

Show Answer & Explanation

Correct Answer: B — Distributing data across multiple databases

Sharding horizontally partitions data across multiple database instances for scalability and performance.

Q16.

Which is an example of unstructured data?

AEmployee database table

BCustomer order CSV file

CSocial media posts

DExcel spreadsheet

Show Answer & Explanation

Correct Answer: C — Social media posts

Social media posts are unstructured - they don't follow a predefined data model or schema.

Q17.

Data governance in Big Data ensures:

AFaster processing only

BData quality, security, and compliance

CCheaper storage

DSmaller file sizes

Show Answer & Explanation

Correct Answer: B — Data quality, security, and compliance

Data governance encompasses policies and processes for data quality, security, privacy, and regulatory compliance.

Q18.

Lambda architecture combines:

AOnly batch processing

BOnly stream processing

CBatch and stream processing

DOnly data storage

Show Answer & Explanation

Correct Answer: C — Batch and stream processing

Lambda architecture combines batch processing for comprehensive analysis and stream processing for real-time views.

Q19.

Which metric measures how current the data is?

AData volume

BData freshness

CData variety

DData veracity

Show Answer & Explanation

Correct Answer: B — Data freshness

Data freshness measures how current or up-to-date the data is - critical for time-sensitive applications.

Q20.

Horizontal scaling in Big Data means:

AAdding more resources to existing machine

BAdding more machines to the cluster

CReducing data size

DCompressing files

Show Answer & Explanation

Correct Answer: B — Adding more machines to the cluster

Horizontal scaling (scale-out) adds more machines to distribute the workload, unlike vertical scaling which upgrades existing machines.

Big Data Fundamentals — Practice MCQs for CCAT

More Big Data Topics

Ready for the real exam?