Practice 20 Big Data Fundamentals multiple-choice questions designed for CDAC CCAT exam preparation. Click "Show Answer" to reveal the correct option with detailed explanation.
Show Answer & Explanation
Correct Answer: D — Validation
The 5 V's of Big Data are Volume, Velocity, Variety, Veracity, and Value. Validation is not one of them.
Show Answer & Explanation
Correct Answer: C — Too large for traditional database systems
Big Data refers to datasets that are too large, complex, or fast-moving for traditional database systems to handle.
Show Answer & Explanation
Correct Answer: B — Velocity
Velocity refers to the speed at which data is generated, collected, and processed.
Show Answer & Explanation
Correct Answer: C — Variety
Variety refers to the different types of data formats - structured (tables), semi-structured (JSON, XML), and unstructured (text, images).
Show Answer & Explanation
Correct Answer: B — HDFS
HDFS (Hadoop Distributed File System) is designed for distributed storage across clusters of commodity hardware.
Show Answer & Explanation
Correct Answer: B — Data Lake stores raw data in native format
Data Lake stores raw data in its native format (schema-on-read), while Data Warehouse requires structured data with predefined schema (schema-on-write).
Show Answer & Explanation
Correct Answer: A — Extract, Transform, Load
ETL stands for Extract, Transform, Load - the process of extracting data from sources, transforming it, and loading into a destination system.
Show Answer & Explanation
Correct Answer: B — Credit card fraud detection
Credit card fraud detection requires real-time processing to identify suspicious transactions as they occur.
Show Answer & Explanation
Correct Answer: C — Trustworthiness of data
Veracity refers to the quality, accuracy, and trustworthiness of the data.
Show Answer & Explanation
Correct Answer: C — Single-user desktop application
Single-user desktop applications don't require Big Data technologies. The others involve processing large datasets.
Show Answer & Explanation
Correct Answer: B — Processing large volumes of data at scheduled intervals
Batch processing involves processing large volumes of accumulated data at scheduled intervals, not in real-time.
Show Answer & Explanation
Correct Answer: B — Yahoo
Hadoop was originally developed at Yahoo based on Google's published papers on MapReduce and GFS.
Show Answer & Explanation
Correct Answer: B — Two of three properties simultaneously
CAP theorem states that a distributed system can only guarantee 2 of 3: Consistency, Availability, and Partition tolerance.
Show Answer & Explanation
Correct Answer: B — Apache Kafka Streams
Apache Kafka Streams is designed for real-time stream processing. MapReduce is for batch processing.
Show Answer & Explanation
Correct Answer: B — Distributing data across multiple databases
Sharding horizontally partitions data across multiple database instances for scalability and performance.
Show Answer & Explanation
Correct Answer: C — Social media posts
Social media posts are unstructured - they don't follow a predefined data model or schema.
Show Answer & Explanation
Correct Answer: B — Data quality, security, and compliance
Data governance encompasses policies and processes for data quality, security, privacy, and regulatory compliance.
Show Answer & Explanation
Correct Answer: C — Batch and stream processing
Lambda architecture combines batch processing for comprehensive analysis and stream processing for real-time views.
Show Answer & Explanation
Correct Answer: B — Data freshness
Data freshness measures how current or up-to-date the data is - critical for time-sensitive applications.
Show Answer & Explanation
Correct Answer: B — Adding more machines to the cluster
Horizontal scaling (scale-out) adds more machines to distribute the workload, unlike vertical scaling which upgrades existing machines.