Big Data

Hadoop Ecosystem — Practice MCQs for CCAT

20 Questions Section B: Programming Big Data

Practice 20 Hadoop Ecosystem multiple-choice questions designed for CDAC CCAT exam preparation. Click "Show Answer" to reveal the correct option with detailed explanation.

Q1.
HDFS stands for:
AHadoop Data File System
BHadoop Distributed File System
CHigh Data File Storage
DHierarchical Distributed File System
Show Answer & Explanation

Correct Answer: B — Hadoop Distributed File System

HDFS stands for Hadoop Distributed File System - designed for storing large files across clusters.

Q2.
In HDFS, the NameNode is responsible for:
AStoring actual data blocks
BManaging metadata and namespace
CRunning MapReduce jobs
DData compression
Show Answer & Explanation

Correct Answer: B — Managing metadata and namespace

NameNode manages the filesystem namespace, maintains directory tree, and tracks where data blocks are stored.

Q3.
DataNodes in HDFS store:
AOnly metadata
BActual data blocks
COnly file names
DOnly directory structure
Show Answer & Explanation

Correct Answer: B — Actual data blocks

DataNodes store actual data blocks and serve read/write requests from clients.

Q4.
Default block size in HDFS is:
A64 KB
B1 MB
C128 MB
D1 GB
Show Answer & Explanation

Correct Answer: C — 128 MB

Default HDFS block size is 128 MB (was 64 MB in earlier versions), optimized for large sequential reads.

Q5.
HDFS default replication factor is:
A1
B2
C3
D5
Show Answer & Explanation

Correct Answer: C — 3

Default replication factor is 3 - data is stored on 3 different nodes for fault tolerance.

Q6.
YARN in Hadoop stands for:
AYet Another Resource Navigator
BYet Another Resource Negotiator
CYielding Application Resource Network
DYouth Application Resource Node
Show Answer & Explanation

Correct Answer: B — Yet Another Resource Negotiator

YARN stands for Yet Another Resource Negotiator - it manages cluster resources and schedules applications.

Q7.
Which component schedules tasks in YARN?
ADataNode
BNameNode
CResourceManager
DSecondary NameNode
Show Answer & Explanation

Correct Answer: C — ResourceManager

ResourceManager is the master that allocates resources and schedules applications across the cluster.

Q8.
Hive is used for:
AReal-time processing
BSQL-like queries on Hadoop
CGraph processing
DStream processing
Show Answer & Explanation

Correct Answer: B — SQL-like queries on Hadoop

Hive provides SQL-like interface (HiveQL) to query data stored in Hadoop, ideal for data warehousing.

Q9.
Pig in Hadoop is:
AA storage system
BA high-level scripting language for data analysis
CA security framework
DA monitoring tool
Show Answer & Explanation

Correct Answer: B — A high-level scripting language for data analysis

Pig provides Pig Latin scripting language for expressing data flows and transformations.

Q10.
HBase is a:
ARelational database
BNoSQL columnar database
CGraph database
DDocument database
Show Answer & Explanation

Correct Answer: B — NoSQL columnar database

HBase is a distributed, scalable NoSQL database that runs on top of HDFS, modeled after Google's Bigtable.

Q11.
Sqoop is used for:
AData visualization
BTransferring data between Hadoop and RDBMS
CStream processing
DMachine learning
Show Answer & Explanation

Correct Answer: B — Transferring data between Hadoop and RDBMS

Sqoop efficiently transfers bulk data between Hadoop and structured datastores like relational databases.

Q12.
Flume is designed for:
ABatch processing
BCollecting and aggregating log data
CSQL queries
DGraph analysis
Show Answer & Explanation

Correct Answer: B — Collecting and aggregating log data

Flume is a distributed service for collecting, aggregating, and moving large amounts of log data.

Q13.
Oozie in Hadoop is a:
ADatabase
BWorkflow scheduler
CQuery engine
DSecurity system
Show Answer & Explanation

Correct Answer: B — Workflow scheduler

Oozie is a workflow scheduler for managing Hadoop jobs, supporting MapReduce, Pig, Hive, etc.

Q14.
ZooKeeper provides:
AData storage
BCoordination services for distributed systems
CQuery processing
DData compression
Show Answer & Explanation

Correct Answer: B — Coordination services for distributed systems

ZooKeeper provides centralized configuration, naming, synchronization, and group services.

Q15.
Secondary NameNode in HDFS:
AIs a backup NameNode
BPerforms checkpointing of namespace
CStores data blocks
DManages YARN
Show Answer & Explanation

Correct Answer: B — Performs checkpointing of namespace

Secondary NameNode periodically merges namespace image with edit log - it's not a failover backup.

Q16.
Which Hadoop component provides security?
AOozie
BKerberos with Ranger/Knox
CFlume
DSqoop
Show Answer & Explanation

Correct Answer: B — Kerberos with Ranger/Knox

Kerberos provides authentication, while Ranger and Knox provide authorization and security management.

Q17.
Spark is faster than MapReduce primarily because:
AUses bigger disk drives
BProcesses data in-memory
CUses faster networks
DHas better UI
Show Answer & Explanation

Correct Answer: B — Processes data in-memory

Spark processes data in-memory (RAM) rather than reading/writing to disk between stages like MapReduce.

Q18.
HDFS is optimized for:
ARandom reads/writes
BSmall files
CLarge sequential reads
DFrequent file modifications
Show Answer & Explanation

Correct Answer: C — Large sequential reads

HDFS is designed for large sequential reads of big files, not random access or small files.

Q19.
Rack awareness in HDFS helps with:
AFaster CPU processing
BFault tolerance and network optimization
CData compression
DQuery performance
Show Answer & Explanation

Correct Answer: B — Fault tolerance and network optimization

Rack awareness places replicas across racks for fault tolerance and optimizes network bandwidth.

Q20.
Which is NOT a Hadoop ecosystem tool?
AHive
BPig
COracle
DHBase
Show Answer & Explanation

Correct Answer: C — Oracle

Oracle is a traditional RDBMS, not part of the Hadoop ecosystem. Hive, Pig, and HBase are Hadoop tools.