Practice 20 Hadoop Ecosystem multiple-choice questions designed for CDAC CCAT exam preparation. Click "Show Answer" to reveal the correct option with detailed explanation.
Show Answer & Explanation
Correct Answer: B — Hadoop Distributed File System
HDFS stands for Hadoop Distributed File System - designed for storing large files across clusters.
Show Answer & Explanation
Correct Answer: B — Managing metadata and namespace
NameNode manages the filesystem namespace, maintains directory tree, and tracks where data blocks are stored.
Show Answer & Explanation
Correct Answer: B — Actual data blocks
DataNodes store actual data blocks and serve read/write requests from clients.
Show Answer & Explanation
Correct Answer: C — 128 MB
Default HDFS block size is 128 MB (was 64 MB in earlier versions), optimized for large sequential reads.
Show Answer & Explanation
Correct Answer: C — 3
Default replication factor is 3 - data is stored on 3 different nodes for fault tolerance.
Show Answer & Explanation
Correct Answer: B — Yet Another Resource Negotiator
YARN stands for Yet Another Resource Negotiator - it manages cluster resources and schedules applications.
Show Answer & Explanation
Correct Answer: C — ResourceManager
ResourceManager is the master that allocates resources and schedules applications across the cluster.
Show Answer & Explanation
Correct Answer: B — SQL-like queries on Hadoop
Hive provides SQL-like interface (HiveQL) to query data stored in Hadoop, ideal for data warehousing.
Show Answer & Explanation
Correct Answer: B — A high-level scripting language for data analysis
Pig provides Pig Latin scripting language for expressing data flows and transformations.
Show Answer & Explanation
Correct Answer: B — NoSQL columnar database
HBase is a distributed, scalable NoSQL database that runs on top of HDFS, modeled after Google's Bigtable.
Show Answer & Explanation
Correct Answer: B — Transferring data between Hadoop and RDBMS
Sqoop efficiently transfers bulk data between Hadoop and structured datastores like relational databases.
Show Answer & Explanation
Correct Answer: B — Collecting and aggregating log data
Flume is a distributed service for collecting, aggregating, and moving large amounts of log data.
Show Answer & Explanation
Correct Answer: B — Workflow scheduler
Oozie is a workflow scheduler for managing Hadoop jobs, supporting MapReduce, Pig, Hive, etc.
Show Answer & Explanation
Correct Answer: B — Coordination services for distributed systems
ZooKeeper provides centralized configuration, naming, synchronization, and group services.
Show Answer & Explanation
Correct Answer: B — Performs checkpointing of namespace
Secondary NameNode periodically merges namespace image with edit log - it's not a failover backup.
Show Answer & Explanation
Correct Answer: B — Kerberos with Ranger/Knox
Kerberos provides authentication, while Ranger and Knox provide authorization and security management.
Show Answer & Explanation
Correct Answer: B — Processes data in-memory
Spark processes data in-memory (RAM) rather than reading/writing to disk between stages like MapReduce.
Show Answer & Explanation
Correct Answer: C — Large sequential reads
HDFS is designed for large sequential reads of big files, not random access or small files.
Show Answer & Explanation
Correct Answer: B — Fault tolerance and network optimization
Rack awareness places replicas across racks for fault tolerance and optimizes network bandwidth.
Show Answer & Explanation
Correct Answer: C — Oracle
Oracle is a traditional RDBMS, not part of the Hadoop ecosystem. Hive, Pig, and HBase are Hadoop tools.