Big Data

Big Data Fundamentals — Practice MCQs for CCAT

20 Questions Section B: Programming Big Data

Practice 20 Big Data Fundamentals multiple-choice questions designed for CDAC CCAT exam preparation. Click "Show Answer" to reveal the correct option with detailed explanation.

Q1.
Which of the following is NOT one of the 5 V's of Big Data?
AVolume
BVelocity
CVariety
DValidation
Show Answer & Explanation

Correct Answer: D — Validation

The 5 V's of Big Data are Volume, Velocity, Variety, Veracity, and Value. Validation is not one of them.

Q2.
Big Data refers to datasets that are:
AOnly structured data
BSmall enough to process on a single machine
CToo large for traditional database systems
DOnly unstructured data
Show Answer & Explanation

Correct Answer: C — Too large for traditional database systems

Big Data refers to datasets that are too large, complex, or fast-moving for traditional database systems to handle.

Q3.
Which characteristic of Big Data refers to the speed at which data is generated?
AVolume
BVelocity
CVariety
DVeracity
Show Answer & Explanation

Correct Answer: B — Velocity

Velocity refers to the speed at which data is generated, collected, and processed.

Q4.
Structured, semi-structured, and unstructured data represent which Big Data characteristic?
AVolume
BVelocity
CVariety
DValue
Show Answer & Explanation

Correct Answer: C — Variety

Variety refers to the different types of data formats - structured (tables), semi-structured (JSON, XML), and unstructured (text, images).

Q5.
Which technology is primarily used for distributed storage in Big Data?
AMySQL
BHDFS
CSQLite
DPostgreSQL
Show Answer & Explanation

Correct Answer: B — HDFS

HDFS (Hadoop Distributed File System) is designed for distributed storage across clusters of commodity hardware.

Q6.
Data Lake is different from Data Warehouse because:
AData Lake only stores structured data
BData Lake stores raw data in native format
CData Warehouse is cheaper
DData Lake requires schema before loading
Show Answer & Explanation

Correct Answer: B — Data Lake stores raw data in native format

Data Lake stores raw data in its native format (schema-on-read), while Data Warehouse requires structured data with predefined schema (schema-on-write).

Q7.
ETL in Big Data stands for:
AExtract, Transform, Load
BEncrypt, Transfer, Log
CExecute, Test, Launch
DExport, Translate, Link
Show Answer & Explanation

Correct Answer: A — Extract, Transform, Load

ETL stands for Extract, Transform, Load - the process of extracting data from sources, transforming it, and loading into a destination system.

Q8.
Which is an example of real-time Big Data processing?
AMonthly sales reports
BCredit card fraud detection
CAnnual inventory audit
DQuarterly financial statements
Show Answer & Explanation

Correct Answer: B — Credit card fraud detection

Credit card fraud detection requires real-time processing to identify suspicious transactions as they occur.

Q9.
Veracity in Big Data refers to:
ASpeed of data
BSize of data
CTrustworthiness of data
DType of data
Show Answer & Explanation

Correct Answer: C — Trustworthiness of data

Veracity refers to the quality, accuracy, and trustworthiness of the data.

Q10.
Which is NOT a common Big Data use case?
APredictive maintenance
BCustomer sentiment analysis
CSingle-user desktop application
DRecommendation engines
Show Answer & Explanation

Correct Answer: C — Single-user desktop application

Single-user desktop applications don't require Big Data technologies. The others involve processing large datasets.

Q11.
Batch processing in Big Data is characterized by:
AReal-time responses
BProcessing large volumes of data at scheduled intervals
CImmediate data processing
DStream processing
Show Answer & Explanation

Correct Answer: B — Processing large volumes of data at scheduled intervals

Batch processing involves processing large volumes of accumulated data at scheduled intervals, not in real-time.

Q12.
Which company originally developed Hadoop?
AGoogle
BYahoo
CFacebook
DAmazon
Show Answer & Explanation

Correct Answer: B — Yahoo

Hadoop was originally developed at Yahoo based on Google's published papers on MapReduce and GFS.

Q13.
The CAP theorem states that a distributed system can have at most:
AOne property
BTwo of three properties simultaneously
CAll three properties
DNone of the properties
Show Answer & Explanation

Correct Answer: B — Two of three properties simultaneously

CAP theorem states that a distributed system can only guarantee 2 of 3: Consistency, Availability, and Partition tolerance.

Q14.
Which is a stream processing framework?
AHadoop MapReduce
BApache Kafka Streams
CMySQL
DOracle DB
Show Answer & Explanation

Correct Answer: B — Apache Kafka Streams

Apache Kafka Streams is designed for real-time stream processing. MapReduce is for batch processing.

Q15.
Data sharding is used for:
AData encryption
BDistributing data across multiple databases
CData compression
DData backup
Show Answer & Explanation

Correct Answer: B — Distributing data across multiple databases

Sharding horizontally partitions data across multiple database instances for scalability and performance.

Q16.
Which is an example of unstructured data?
AEmployee database table
BCustomer order CSV file
CSocial media posts
DExcel spreadsheet
Show Answer & Explanation

Correct Answer: C — Social media posts

Social media posts are unstructured - they don't follow a predefined data model or schema.

Q17.
Data governance in Big Data ensures:
AFaster processing only
BData quality, security, and compliance
CCheaper storage
DSmaller file sizes
Show Answer & Explanation

Correct Answer: B — Data quality, security, and compliance

Data governance encompasses policies and processes for data quality, security, privacy, and regulatory compliance.

Q18.
Lambda architecture combines:
AOnly batch processing
BOnly stream processing
CBatch and stream processing
DOnly data storage
Show Answer & Explanation

Correct Answer: C — Batch and stream processing

Lambda architecture combines batch processing for comprehensive analysis and stream processing for real-time views.

Q19.
Which metric measures how current the data is?
AData volume
BData freshness
CData variety
DData veracity
Show Answer & Explanation

Correct Answer: B — Data freshness

Data freshness measures how current or up-to-date the data is - critical for time-sensitive applications.

Q20.
Horizontal scaling in Big Data means:
AAdding more resources to existing machine
BAdding more machines to the cluster
CReducing data size
DCompressing files
Show Answer & Explanation

Correct Answer: B — Adding more machines to the cluster

Horizontal scaling (scale-out) adds more machines to distribute the workload, unlike vertical scaling which upgrades existing machines.