NoSQL Databases Question Bank for C-CAT
Topic-wise NoSQL Databases MCQs for CDAC C-CAT preparation with answers and explanations.
Show Answer & Explanation
Correct Answer: C - Not only SQL
NoSQL means "Not only SQL" - databases that provide mechanisms other than traditional relational models.
Show Answer & Explanation
Correct Answer: B - Document-oriented database
MongoDB stores data as flexible JSON-like documents (BSON), making it a document-oriented database.
Show Answer & Explanation
Correct Answer: A - In-memory key-value store/cache
Redis is an in-memory key-value store, often used for caching, sessions, and real-time analytics.
Show Answer & Explanation
Correct Answer: B - High write throughput and availability
Cassandra is designed for high write throughput, high availability, and linear scalability.
Show Answer & Explanation
Correct Answer: B - Graph database
Neo4j is a graph database optimized for storing and querying highly connected data.
Show Answer & Explanation
Correct Answer: D - Few NoSQL databases, mostly RDBMS
ACID is traditionally an RDBMS strength. Most NoSQL databases prioritize availability over strict ACID.
Show Answer & Explanation
Correct Answer: A - Basic Availability, Soft state, Eventual consistency
BASE: Basically Available, Soft state, Eventual consistency - NoSQL alternative to ACID.
Show Answer & Explanation
Correct Answer: A - Key-value store
Key-value stores are ideal for session data and user profiles - fast lookup by key.
Show Answer & Explanation
Correct Answer: C - As columns grouped into families
Column family databases store data in columns grouped into column families, optimizing for specific query patterns.
Show Answer & Explanation
Correct Answer: B - Data will become consistent given enough time
Eventual consistency guarantees that, given no new updates, all replicas will eventually converge to the same value.
Show Answer & Explanation
Correct Answer: C - Horizontally partitions data across nodes
Sharding distributes data horizontally across multiple nodes based on a shard key.
Show Answer & Explanation
Correct Answer: B - MapReduce views and Mango queries
CouchDB uses MapReduce views for complex queries and Mango for declarative JSON queries.
Show Answer & Explanation
Correct Answer: B - Neo4j
Graph databases like Neo4j excel at traversing relationships, perfect for social networks.
Show Answer & Explanation
Correct Answer: B - A table in RDBMS
MongoDB collections are analogous to RDBMS tables - they group related documents.
Show Answer & Explanation
Correct Answer: A - Amazon
DynamoDB is Amazon's fully managed NoSQL database service on AWS.
Show Answer & Explanation
Correct Answer: D - Automatically expires keys after specified time
TTL automatically deletes keys after a specified duration - useful for caching and session management.
Show Answer & Explanation
Correct Answer: D - Flexible schema that can vary between documents
Schemaless allows documents in the same collection to have different fields and structures.
Show Answer & Explanation
Correct Answer: C - Durability by logging before writing
WAL logs changes before applying them to data files, ensuring durability in case of crashes.
Show Answer & Explanation
Correct Answer: D - PostgreSQL
PostgreSQL is a traditional relational database (RDBMS), not a NoSQL database.
Show Answer & Explanation
Correct Answer: A - Minimum nodes that must agree for operation success
Quorum defines the minimum number of nodes that must acknowledge an operation for it to succeed.
Show Answer & Explanation
Correct Answer: B - Not only SQL
NoSQL stands for 'Not only SQL.' It refers to a broad class of database management systems that differ from traditional relational databases. NoSQL databases may support SQL-like query languages but are not limited to them.
Show Answer & Explanation
Correct Answer: C - MongoDB
MongoDB is a document-oriented NoSQL database that stores data in flexible, JSON-like documents (BSON format). Redis is a key-value store, Cassandra is a wide-column store, and Neo4j is a graph database.
Show Answer & Explanation
Correct Answer: D - A distributed system can guarantee at most two of Consistency, Availability, and Partition tolerance at the same time
The CAP theorem (Brewer's theorem) states that in a distributed data store, it is impossible to simultaneously guarantee all three of Consistency, Availability, and Partition tolerance. During a network partition, a system must choose between consistency and availability.
Show Answer & Explanation
Correct Answer: C - Wide-column store
Apache Cassandra is a wide-column store (also called column-family store) NoSQL database. It organizes data into rows and dynamic columns within column families, optimized for high write throughput and horizontal scalability.
Show Answer & Explanation
Correct Answer: D - A group of documents, similar to a table in relational databases
In MongoDB, a collection is a group of documents and is the equivalent of a table in relational databases. Unlike relational tables, collections do not enforce a schema, allowing documents in the same collection to have different structures.
Show Answer & Explanation
Correct Answer: A - Given enough time without new updates, all replicas will eventually converge to the same value
Eventual consistency means that if no new updates are made, all replicas of the data will eventually converge to the same value. It prioritizes availability and partition tolerance over immediate consistency, and is commonly used in AP systems.
Show Answer & Explanation
Correct Answer: B - Neo4j
Neo4j is a graph database that uses a graph structure with nodes (entities), edges (relationships), and properties to store and query data. It is optimized for traversing relationships and is ideal for social networks, recommendation engines, and fraud detection.
Show Answer & Explanation
Correct Answer: A - A partition key and optionally one or more clustering columns
In Cassandra, the primary key is composed of a partition key (determines which node stores the data) and optionally one or more clustering columns (determine the sort order of rows within a partition).
Show Answer & Explanation
Correct Answer: D - Horizontally partitioning data across multiple database instances
Sharding is the process of horizontally partitioning data across multiple database instances (shards). Each shard holds a subset of the total data, allowing the database to scale horizontally and distribute the load across multiple servers.
Show Answer & Explanation
Correct Answer: B - The system continues to function even when network communication between nodes is unreliable or lost
Partition Tolerance means the system continues to operate despite arbitrary message loss or failure of part of the network. In distributed systems, network partitions are inevitable, so real-world systems must handle them.
Show Answer & Explanation
Correct Answer: B - Redis
Redis is a key-value store NoSQL database that stores data as key-value pairs in memory. It is extremely fast and is commonly used for caching, session management, real-time analytics, and message queuing.
Show Answer & Explanation
Correct Answer: D - A binary representation of JSON-like documents
BSON (Binary JSON) is a binary-encoded serialization of JSON-like documents used by MongoDB. It extends JSON with additional data types like Date, Binary, and ObjectId, and is designed for efficient storage and traversal.
Show Answer & Explanation
Correct Answer: C - Eventual consistency
Cassandra uses eventual consistency by default, meaning data will eventually be consistent across all replicas but may not be immediately consistent after a write. However, Cassandra allows tunable consistency levels per query.
Show Answer & Explanation
Correct Answer: A - A group of MongoDB instances that maintain the same data set for high availability
A replica set in MongoDB is a group of mongod instances that maintain the same data set. It provides redundancy and high availability with automatic failover. One node is the primary (handles writes) and others are secondaries (replicate data).
Show Answer & Explanation
Correct Answer: D - Horizontal scalability and flexible schema
NoSQL databases offer horizontal scalability (adding more servers) and flexible schemas (documents in the same collection can have different fields). Relational databases excel at ACID transactions, fixed schemas, and JOIN operations.
Show Answer & Explanation
Correct Answer: B - A peer-to-peer communication protocol for nodes to share state information about themselves and other nodes
The Gossip protocol is a peer-to-peer communication mechanism used in Cassandra where nodes periodically exchange state information (such as node health, load, and schema versions) with random other nodes to maintain cluster-wide awareness.
Show Answer & Explanation
Correct Answer: C - A document nested inside another document
An embedded document in MongoDB is a document nested within another document. This denormalized approach allows related data to be stored together, reducing the need for joins and improving read performance for data that is accessed together.
Show Answer & Explanation
Correct Answer: A - Data is written to the commit log for durability, then to an in-memory memtable, and eventually flushed to SSTables on disk
In Cassandra's write path, data is first written to the commit log (for durability), then to an in-memory data structure called a memtable. When the memtable is full, it is flushed to an immutable on-disk file called an SSTable.
Show Answer & Explanation
Correct Answer: C - Basically Available, Soft state, Eventually consistent
BASE stands for Basically Available, Soft state, Eventually consistent. It is an alternative to ACID for distributed systems. It emphasizes availability over consistency, allows state to change over time, and guarantees eventual convergence.
Show Answer & Explanation
Correct Answer: D - db.collection.insertOne()
db.collection.insertOne() is used to insert a single document into a MongoDB collection. For multiple documents, db.collection.insertMany() is used. The deprecated db.collection.save() would insert or update based on the _id field.
Show Answer & Explanation
Correct Answer: B - The process of merging multiple SSTables into a single SSTable, removing deleted and outdated data
Compaction in Cassandra is the process of merging multiple SSTables into fewer, larger SSTables while removing tombstones (deleted data), expired data, and duplicate entries. This reclaims disk space and improves read performance.
Show Answer & Explanation
Correct Answer: C - CP systems prioritize consistency over availability during partitions, while AP systems prioritize availability over consistency
CP systems (like HBase, MongoDB) prioritize consistency, potentially becoming unavailable during network partitions. AP systems (like Cassandra, DynamoDB) prioritize availability, continuing to serve requests during partitions but potentially returning stale data.
Show Answer & Explanation
Correct Answer: D - A marker indicating that data has been deleted, used instead of immediately removing data
A tombstone in Cassandra is a delete marker that indicates a row or column has been deleted. Instead of immediately removing data, Cassandra writes a tombstone. The actual data is removed during compaction after a configurable grace period.
Show Answer & Explanation
Correct Answer: A - ALL
The ALL consistency level in Cassandra requires acknowledgment from all replica nodes for both reads and writes. This provides the strongest consistency but reduces availability since the operation fails if any replica is down.
Show Answer & Explanation
Correct Answer: D - A pipeline-based framework for transforming and analyzing data through a sequence of stages
MongoDB's aggregation framework is a pipeline-based approach for data transformation and analysis. Documents pass through a sequence of stages (like $match, $group, $sort, $project) where each stage transforms the documents in some way.
Show Answer & Explanation
Correct Answer: C - The number of copies of data stored across different nodes in the cluster
The replication factor in Cassandra determines how many copies of each row of data are stored across different nodes in the cluster. A replication factor of 3 means each row is replicated on three different nodes for fault tolerance.
Show Answer & Explanation
Correct Answer: C - Intentionally duplicating data across documents or tables to optimize read performance
Denormalization in NoSQL is the practice of intentionally duplicating data to optimize read performance. Since NoSQL databases typically don't support joins efficiently, data that is frequently queried together is stored together, trading write complexity for read speed.
Show Answer & Explanation
Correct Answer: A - Graph database
Graph databases like Neo4j are best suited for social network relationships because they natively represent and efficiently traverse relationships (edges) between entities (nodes). Finding friends-of-friends or shortest paths is naturally optimized.
Show Answer & Explanation
Correct Answer: A - A majority (more than half) of the replica nodes must respond
QUORUM requires acknowledgment from a majority of replica nodes: (replication_factor / 2) + 1. For a replication factor of 3, QUORUM requires 2 nodes. It provides a balance between consistency and availability.
Show Answer & Explanation
Correct Answer: A - The top-level container (similar to a database in RDBMS) that defines replication strategy and factor
A keyspace in Cassandra is the outermost container for data, similar to a database in relational systems. It defines the replication strategy (SimpleStrategy or NetworkTopologyStrategy) and replication factor for all tables within it.