Practice 20 MapReduce multiple-choice questions designed for CDAC CCAT exam preparation. Click "Show Answer" to reveal the correct option with detailed explanation.
Show Answer & Explanation
Correct Answer: C — Map and Reduce phases
MapReduce has Map phase (transforms data into key-value pairs) and Reduce phase (aggregates values by key).
Show Answer & Explanation
Correct Answer: B — Key-value pairs
Map function processes input and emits intermediate key-value pairs for the Reduce phase.
Show Answer & Explanation
Correct Answer: B — Between Map and Reduce phases
Shuffle and Sort transfers Map output to Reducers and sorts data by keys between phases.
Show Answer & Explanation
Correct Answer: B — A mini-reducer that runs on Map output
Combiner is an optional local reducer that runs on Map output to reduce network transfer.
Show Answer & Explanation
Correct Answer: B — Which Reducer gets which key
Partitioner determines which Reducer receives which key-value pairs, typically using hash of the key.
Show Answer & Explanation
Correct Answer: A — Key-value pair
Map receives a key-value pair where key is typically offset and value is the line content.
Show Answer & Explanation
Correct Answer: B — Defines how to read and split input files
InputFormat defines how input files are read and split into InputSplits for Map tasks.
Show Answer & Explanation
Correct Answer: B — Number of input splits
Number of Map tasks equals number of input splits, which depends on input data size and block size.
Show Answer & Explanation
Correct Answer: A — (word, 1) for each word
In Word Count, Map emits (word, 1) for each word occurrence; Reduce sums the counts per word.
Show Answer & Explanation
Correct Answer: B — Runs duplicate tasks to handle slow nodes
Speculative execution runs backup copies of slow-running tasks to prevent stragglers from delaying jobs.
Show Answer & Explanation
Correct Answer: B — Key and iterator of all values for that key
Reduce receives a key and an iterator over all values associated with that key after shuffle/sort.
Show Answer & Explanation
Correct Answer: B — Defines how to write output
OutputFormat defines how Reduce output is written - format, location, and structure.
Show Answer & Explanation
Correct Answer: B — Moving computation to where data resides
Data locality moves computation to nodes where data is stored rather than moving data over network.
Show Answer & Explanation
Correct Answer: B — Reads input split and generates key-value pairs
RecordReader reads an InputSplit and generates key-value pairs for the Map function.
Show Answer & Explanation
Correct Answer: B — 1
Default number of Reducers is 1, but can be configured based on data size and cluster capacity.
Show Answer & Explanation
Correct Answer: B — Tracking job statistics and metrics
Counters track various statistics like input/output records, bytes processed, and custom metrics.
Show Answer & Explanation
Correct Answer: B — Read-only data distribution to all nodes
DistributedCache distributes read-only files (like lookup tables) to all nodes before task execution.
Show Answer & Explanation
Correct Answer: B — No Reduce phase
Map-only jobs set Reducers to 0, outputting Map results directly without reduce/aggregation.
Show Answer & Explanation
Correct Answer: B — Resource management and job scheduling
JobTracker managed resources and scheduled jobs in Hadoop 1.x, replaced by YARN ResourceManager in 2.x.
Show Answer & Explanation
Correct Answer: A — Sorts by value as well as key
Secondary sort allows sorting by both key and value, using composite keys and custom comparators.