Wednesday, November 21, 2007

Tips for Writing High-performance SQL

Tips for Writing High-performance SQL
These tips apply broadly when writing high-performance stored procedures. Unfortunately, unlike some tips, you can't simply apply most of them without first considering the nature and schema of the data you're querying.

Avoid using cursors (as well as other looping structures) as much as possible.
Cursors are inefficient, and database engines usually don't have the best loop implementations in terms of performance.
On the database side, you can usually replace code involving cursors with aggregate SQL statements (SELECT, INSERT, and UPDATE) that use vector tables. All database engines are heavily optimized for aggregate statements, so even if a loop is unavoidable, it is always better to execute a few aggregate statements in a loop with a small number of iterations, than to create a cursor and execute simple statements over a large number of iterations.
Even if initial performance tests, especially with a small amount of data, show cursors to be more efficient than a complex aggregate statement, it is worthwhile to try to optimize the operation by breaking it into smaller portions or using other approaches—unless you can guarantee that the data value will stay small. Cursor approaches will not scale.

Filter data wisely.
One alternative to using cursors uses a fall-through approach, filtering and aggregating data in multiple steps via a set of data storages, which could be physical tables, temporary tables, or table variables. It is usually best to include some aggregate filters into aggregate statements to filter out the majority of data in one simple shot whenever necessary, working on smaller amounts of data. Then you can proceed with joining and filtering, making sure to keep the number of join permutations under control at all times.

It is usually more efficient to execute multiple statements with one condition than a single statement with multiple OR conditions…
… when executing UPDATE and DELETE statements against permanent database tables that can be accessed by multiple users simultaneously. This tip is especially important from the scalability point of view; from the performance point of view the difference is usually marginal. The major reason for the tip is the locking of the database records and the lock escalations that occur behind the scenes.
Make wise distinctions between temp tables and table variables.
Table variables are in-memory structures that may work from 2-100 times faster than temp tables. But keep in mind that access to table variables gets slower as the volume of data they contain grows. At some point, table variables will overflow the available memory and that kills the performance. Therefore, use table variables only when their data content is guaranteed not to grow unpredictably; the breaking size is around several thousand records. For larger data volumes, I recommend temp tables with clustered indexes. Interestingly, I've found that a temp table with one clustered index is often faster than having multiple simple indexes. In contrast, multiple simple indexes with physical tables are often faster than one clustered index.

Make careful distinctions between hard rules and assumptions.
This is more of a business design tip, which applies more to code design than to performance and scalability design in general. In real life however, performance and scalability are generally the first things to suffer from improper design. When rules are implemented as assumptions, they usually cause unnecessary calculations to be performed, affecting performance. However, when assumptions are implemented as rules they tend to cause errors and algorithm failures, which usually requires an urgent redesign. That, in turn, is usually performed with business constraints and results in inefficient final algorithms. That's because bad design decisions are often corrected in a rush and without sufficient resources—sometimes under pressure from customers whose businesses are usually in a critical stage when problems are uncovered, but must continue operating during the process.

Pay attention to join order.
Using proper join order sometimes lets the database engine generate hints that execute joins with an optimal amount of records. Most database engines also support hard hints, but in most cases you should avoid using hard hints and let the database engine figure out the best way to do its job on its own.

Be careful when joining complex views…
…to other views and database tables in complex SELECT statements. When the database contains a significant amount of data, SQL Server engine tends to recalculate the execution plan of the resulting statement, which often results in an inefficient execution plan and may kill the performance. The most difficult part is that the behavior of SQL Server engine is inconsistent in that respect, and heavily depends on the database size, indexes, foreign keys, and other database structures and constraints. The consistent work-around is to pre-select data from the view into a temp table with the reasonable pre-filters, and then use that temp table in place of the underlying view.

Create indexes on temp tables wisely.
As mentioned in an earlier tip, clustered indexes are usually the best in terms of performance for temp tables; however, there is a difference between creating the index before or after inserting data into the temp table. Creating the index before the insert complicates the insert, because the database engine must order the selection. For complex selections such as those mentioned in Tip 7, the extra ordering may overcomplicate the overall statement and drastically degrade the performance. On the other hand, creating the index after the insert forces the database engine to recalculate the execution plan of the stored procedure every time it is called. Therefore, the decision is always a trade-off and you should make it based on the relative costs of the two possibilities.

In general, try to avoid execution plan recalculation.
One common cause of recalculation occurs when the stored procedure contains several paths that depend on values passed in parameters. However, whether avoiding recalculation is possible depends on the complexity of the stored procedure and on other circumstances, such as those described in tip 8. When the engine does recalculate execution, performance always suffers; however, recalculating the execution plan of the caller does not force the execution plan recalculation of the called procedure (or view or function). Therefore, the workaround is to divide one stored procedure into multiple procedures (depending on the passed-in parameters), and then call the children from the parent conditionally. You should perform this subdivision very carefully though, because it can be a maintenance nightmare—but sometimes it seems to be the only way to achieve acceptable database performance and scalability.

Wednesday, October 10, 2007

mainframe tips list

Wk. 39/2007: Submitting jobs for parallel execution
Wk. 37/2007: IBM mainframe abbreviations
Wk. 35/2007: ISPF EDIT command COMP (compare)
Wk. 33/2007: Segmented tablespaces
Wk. 31/2007: A simple calculator
Wk. 27/2007: The difference between correlated and noncorrelated subselects
Wk. 25/2007: Execution of TSO batch programs
Wk. 23/2007: OPTIMIZE FOR n ROWS in SQL statements
Wk. 21/2007: TSO commands in COBOL and PL/1
Wk. 19/2007: A trip with FTP
Wk. 17/2007: SQL function VALUE (COALESCE)
Wk. 14/2007: Comparisons in REXX
Wk. 12/2007: Optimizing your OUTER JOINS
Wk. 10/2007: Your own TSO command processor
Wk. 08/2007: Using EXISTS in SQL
Wk. 06/2007: More about deleting datasets
Wk. 04/2007: ISPF EDIT commands COPY, CREATE and REPLACE
Wk. 02/2007: Minimising the timeout risk when performing a DB2 INSERT
Wk. 51/2006: Variable DD names in PL/1
Wk. 49/2006: Reasonable use of ISPF KEYLIST
Wk. 47/2006: Validation using a DB2 trigger
Wk. 45/2006: Messages from ISPF services
Wk. 43/2006: Correlated subselects for versioning
Wk. 41/2006: Faster EDIT macroes
Wk. 39/2006: Assignment of DCB informations in PL/1
Wk. 37/2006: Problems using newcopy in CICS
Wk. 35/2006: Names of DB2 subsystems using data sharing
Wk. 33/2006: Dataset delete
Wk. 31/2006: Reuse of a DB2 image copy
Wk. 27/2006: TRANSMIT and RECEIVE of PO datasets
Wk. 25/2006: Smarter global temporary tables in DB2
Wk. 23/2006: Enlargement of a concatenation with one more dataset
Wk. 21/2006: Avoid SAVE of data in ISPF EDIT
Wk. 19/2006: The LIBDEF function in ISPF
Wk. 17/2006: Calling LOAD modules from REXX
Wk. 15/2006: Spaces in SQL
Wk. 13/2006: Record splitting using SORT
Wk. 11/2006: Informations in HSM
Wk. 09/2006: SQL statements with joins, GROUP BY and HAVING
Wk. 07/2006: More SMS compression
Wk. 05/2006: Generic searches in ISPF
Wk. 03/2006: Join of columns with different definitions
Wk. 01/2006: Please help me out of this
Wk. 50/2005: The TRANSLATE function in REXX and PL/1
Wk. 48/2005: Calling CICS programs from batch or TSO
Wk. 46/2005: Testing return codes in batch TSO
Wk. 44/2005: -DIS UTIL and DIAGNOSE DISPLAY SYSUTIL
Wk. 42/2005: A job step that always gets executed
Wk. 40/2005: Who is using my MQ queue
Wk. 38/2005: Global temporary tables in DB2
Wk. 36/2005: Start BookManager using your own bookshelf
Wk. 34/2005: Dataset copying using ISPF
Wk. 29/2005: IBM mainframe wildcards
Wk. 27/2005: Options SORTKEYS, SORTDEVT and SORTNUM for DB2 LOAD/REORG
Wk. 25/2005: Making your own HSM backup
Wk. 23/2005: Adjusting Dataset List defaults
Wk. 21/2005: The FIND ALL command
Wk. 19/2005: The HSM command TSO HSEND Q ACT
Wk. 17/2005: Making your own enqueue/dequeue mechanism in DB2
Wk. 15/2005: Writing to a local queue on another MQ subsystem
Wk. 12/2005: Optimizing the SELECT list in SQL statements
Wk. 10/2005: Controlling your COBOL modules using DD cards
Wk. 08/2005: Making less MQ remote queue definitions
Wk. 06/2005: Remove leading blanks using PL/1
Wk. 04/2005: CICS In-Storage tables
Wk. 02/2005: Making your MQ definitions in batch
Wk. 52/2004: IN lists in SQL statements
Wk. 50/2004: ISPF EDIT command HEX
Wk. 48/2004: Accessing main storage with PL/1
Wk. 46/2004: Line commands for ISPF Edit macros
Wk. 44/2004: Object oriented PL/1
Wk. 42/2004: SELECT * FROM table
Wk. 40/2004: DISTINCT used with UNION or UNION ALL
Wk. 38/2004: Using SYSOUT=* to display dataset contens
Wk. 36/2004: CURRENT SQLID
Wk. 34/2004: Install of CICS definitions without using CEDA
Wk. 32/2004: User friendly BookManager
Wk. 28/2004: Returning bad SQLCODE when SELECTing zero rows
Wk. 26/2004: Dynamic call of programs from PL/1
Wk. 24/2004: Fallback of DB2 plans/packages
Wk. 22/2004: The OPC RECOVER function
Wk. 20/2004: WORK datasets for ISPF
Wk. 18/2004: Selfdocumenting REXX programs
Wk. 16/2004: SMS dataset compression
Wk. 14/2004: The DISCARD option on DB2 REORG
Wk. 12/2004: The ISPF EDIT command BNDS
Wk. 10/2004: The TSO command PROFILE
Wk. 08/2004: Dirty reads in SQL statements
Wk. 05/2004: Command SE in SDSF and JCL parameter SCHENV
Wk. 03/2004: Executing DFSORT using the DSN processor
Wk. 52/2003: Deleting all rows in a DB2 table
Wk. 50/2003: Issuing ISPF messages from REXX
Wk. 48/2003: The order of predicates in SQL
Wk. 46/2003: Hypertext with ISPF Edit/View
Wk. 44/2003: Deletion of members
Wk. 42/2003: Index-screening in DB2
Wk. 40/2003: Changing data using File-Aid/Batch
Wk. 38/2003: Executing other SQL statements than SELECT using DSNTIAUL
Wk. 36/2003: Changing your data using DFSORT
Wk. 34/2003: Speeding up deletion of rows in DB2 tables
Wk. 32/2003: Read any file using PL/1
Wk. 29/2003: Row level locking in DB2
Wk. 27/2003: Trapping of screen output from TSO commands
Wk. 25/2003: Allocation of DB2-spaces
Wk. 23/2003: Using % in front of TSO commands
Wk. 21/2003: Showing the contens of an ISPF-table
Wk. 19/2003: Disabling Abend-Aid in batch
Wk. 16/2003: Optimization of calls to VAG from COBOL programs
Wk. 14/2003: Finding rows in DB2 table with the same key
Wk. 12/2003: Concatenation on DD *
Wk. 10/2003: DB2 trace facilities: DSN-command and CAF
Wk. 08/2003: Asynchronous tasks in CICS: DELAY
Wk. 06/2003: Asynchronous tasks in CICS: START and CANCEL
Wk. 04/2003: ISPF EDIT labels .ZF and .ZL
Wk. 02/2003: Display Active (DA) in SDSF of many MVS-systems
Wk. 50/2002: Communicate with another CICS from your CICS
Wk. 48/2002: Cheating the DB2-optimizer with an IN-list
Wk. 46/2002: Special validations in ISPF panels
Wk. 44/2002: DB2 CAF interface
Wk. 42/2002: Names for variables in REXX
Wk. 40/2002: Errorhandling in CICS
Wk. 38/2002: DB2 REORG UNLOAD EXTERNAL
Wk. 36/2002: Dynamic allocation of REXX-libraries
Wk. 34/2002: Generating ready-to-use SYSIN with DB2
Wk. 32/2002: Using ? in DB2 SQL EXPLAIN statements
Wk. 28/2002: Copying the same line to different places with ISPF EDIT
Wk. 26/2002: ISPF log dataset
Wk. 24/2002: Accessing main storage with COBOL
Wk. 22/2002: Restore of HSM backups
Wk. 20/2002: A fast DB2 reorg
Wk. 18/2002: CICS-programs in status REQ
Wk. 16/2002: The ISPF command RETP
Wk. 13/2002: Setting return code to 0 in TSO batch
Wk. 11/2002: GROUP BY on expressions in SQL
Wk. 09/2002: Who is using my dataset
Wk. 07/2002: The SQL CASE expression
Wk. 05/2002: Finding informations about COBOL abends
Wk. 03/2002: Browse and Edit of DB2 tables
Wk. 52/2001: Giving other users access to your datasets
Wk. 50/2001: The PROJECT, GROUP and TYPE fields in ISPF
Wk. 48/2001: Making your own bookshelf in BookManager
Wk. 46/2001: An ISPF-command trick
Wk. 44/2001: The table with exactly one row
Wk. 42/2001: Internal tables in REXX
Wk. 40/2001: Establishing the existance of a row using SQL
Wk. 38/2001: The MAX-RC column in SDSF
Wk. 36/2001: The CUT-function in the 3270-emulator
Wk. 34/2001: Crossreferences between DB2-tables and programs
Wk. 32/2001: Syncronizing jobs with DISP=OLD
Wk. 28/2001: UNLOAD your DB2-data with DSNTIAUL
Wk. 26/2001: LOAD module timestamp
Wk. 24/2001: ISPF tool ISRDDN with enqueues
Wk. 22/2001: Easy copying of datasets
Wk. 20/2001: Finding data inconsistency using SQL
Wk. 18/2001: Some useful SDSF commands
Wk. 15/2001: When a dataset runs out of space
Wk. 13/2001: Useful RACF commands
Wk. 11/2001: TSO command HQ
Wk. 09/2001: ISPF View command REPLACE
Wk. 07/2001: Conversion of CHAR and VARCHAR to DECIMAL using SQL
Wk. 05/2001: Use of SPACE parameter with existing datasets
Wk. 03/2001: ISPF Edit line-commands L and F
Wk. 01/2001: ISPF Edit macro ONLY

Monday, October 8, 2007

what is DB2

A {relational database} from {IBM}. When running under IBM's MVS ({Multiple Virtual Storage}) {operating system}, DB2 is implemented on top of {VSAM} and uses its underlying data structures. DB2, later called "DB2 Universal DataBase", also runs under {windows NT}, {AIX}, {Solaris} and, most recently, {Linux}

DB2 is a family of relational database management system (RDBMS) products from IBM that serve a number of different operating system platforms. According to IBM, DB2 leads in terms of database market share and performance. Although DB2 products are offered for Unix-based systems and personal computer operating systems, DB2 trails Oracle's database products in UNIX-based systems and Microsoft's Access in Windows systems.

In addition to its offerings for the mainframe OS/390 and VM operating systems and its mid-range AS/400 systems, IBM offers DB2 products for a cross-platform spectrum that includes UNIX-based Linux, HP-UX, Sun Solaris, and SCO UnixWare; and for its personal computer OS/2 operating system as well as for Microsoft's Windows 2000 and earlier systems. DB2 databases can be accessed from any application program by using Microsoft's Open Database Connectivity (ODBC) interface, the Java Database Connectivity (JDBC) interface, or a CORBA interface broker.


On January 30th, 2006, IBM released a no-charge edition of DB2 called DB2 9 Express-C. This was an expected response to the recently announced free versions of Oracle 10g and Microsoft SQL Server. Express-C will have no limit on number of users or database size. It is deployable on Windows and Linux machines with up to two processors (4 cores in total) and up to 4GB of memory.

DB2 can be administered from either the command-line or a GUI. The command-line interface requires more knowledge of the product but can be more easily scripted and automated. The GUI is a multi-platform Java client that contains a variety of wizards suitable for novice users. DB2 supports both SQL and XQuery. DB2 has native implementation of XML data storage, where XML data is stored as XML(not as relational data or CLOB data) for faster access using XQuery.