Query optimizer |
The query optimizer is the component of a database management system that is used to analyze queries submitted to database server for execution, and then determine the optimal way to execute the query (Query plan).
The query optimizer cannot be accessed directly by users. Instead, once queries are submitted to database server, and parsed by the parser, they are then passed to the query optimizer where optimization occurs.
=Implementation=
Most query optimizers represent query plans as a Tree data structure of plan nodes . A plan node encapsulates a single operation that is required to execute the query. The nodes are arranged as a tree, in which intermediate results flow from the bottom of the tree to the top. Each node has zero or more child nodes -- those are nodes whose output is fed as input to the parent node. For example, a join node will have two child nodes, which represent the two join operands, whereas a sort node would have a single child node (the input to be sorted). The leaves of the tree are nodes which produce results by scanning the disk, for example by performing an index scan or a sequential scan.
==Join optimization==
Most query optimizers determine Join (SQL) order via a dynamic programming algorithm derived from the IBM System R database system. This algorithm works in two stages:
# First, all ways to access each Relation (aka Table (database)) in the query are computed. Every relation in the query can be accessed via a sequential scan. If there is an Index (database) on a relation that can be used to answer a Predicate in the query, an index scan can also be used. For each relation, the optimizer records the cheapest way to scan the relation, as well as the cheapest way to scan the relation that produces records in a particular sorted order. # The optimizer then considers combining each pair of relations for which a join condition exists. For each pair, the optimizer will consider the available join algorithms implemented by the Database management system. It will preserve the cheapest way to join each pair of relations, in addition to the cheapest way to join each pair of relations that produces its output according to a particular sort order. # Then all three-relation query plans are computed, by joining each two-relation plan produced by the previous phase with the remaining relations in the query.
In this manner, a Query plan is eventually produced that joins all the queries in the relation. Note that the algorithm keeps track of the sort order of the result set produced by a query plan. This is to avoid a redundant sort operation later on in processing the query.
Historically, System-R derived query optimizers would often only consider left-deep query plans. This Heuristic is drawn from the observation that join algorithms such as nested loops only require a single tuple (aka Row (database)) of the outer relation at a time. Therefore, a left-deep query plan means that fewer tuples need to be held in memory at any time: the outer relation s join plan need only be executed until a single tuple is produced, and then the inner base relation can be scanned (this technique is called pipelining ). This heuristic reduces the number of plans that need to be considered, but may result in not considering the optimal query plan.
= See also =
|
|