HOW TO WORK IN PERFORMANCE TUNING
The goal of performance tuning is to optimize session performance
by eliminating performance bottlenecks. To tune session performance, first
identify a performance bottleneck, eliminate it, and then identify the next
performance bottleneck until you are satisfied with the session performance.
You can use the test load option to run sessions when you tune session
performance.
If you tune all the bottlenecks, you can further optimize session
performance by increasing the number of pipeline partitions in the session.
Adding partitions can improve performance by utilizing more of the system
hardware while processing the session.
Because determining the best way to improve performance can be
complex, change one variable at a time, and time the session both before and
after the change. If session performance does not improve, you might want to
return to the original configuration.
Complete the following tasks to improve session performance:
1. Optimize
the target. Enables the Integration Service to write to the targets
efficiently.
2. Optimize
the source. Enables the Integration Service to read source data efficiently.
3. Optimize
the mapping. Enables the Integration Service to transform and move data
efficiently.
4. Optimize
the transformation. Enables the Integration Service to process transformations in a
mapping efficiently.
5. Optimize
the session. Enables the Integration Service to run the session more quickly.
6. Optimize
the grid deployments. Enables the Integration Service to run on a grid with optimal
performance.
7. Optimize
the Power Center components. Enables the Integration
Service and Repository Service to function optimally.
8. Optimize
the system. Enables Power Center service processes to run more quickly.
Partition Technique
- A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.
- When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in every pipeline stage.PARTITIONING ATTRIBUTES1. Partition points
- By default, IS sets partition points at various transformations in the pipeline.
- Partition points mark thread boundaries and divide the pipeline into stages.
- A stage is a section of a pipeline between any two partition points.2. Number of Partitions
- we can define up to 64 partitions at any partition point in a pipeline.
- When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.
- increasing the number of partitions or partition points increases the number of threads.
- The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.3. Partition types
- The Integration Service creates a default partition type at each partition point.
- If we have the Partitioning option, we can change the partition type. This option is purchased separately.
- The partition type controls how the Integration Service distributes data among partitions at partition points.PARTITIONING TYPES1. Round Robin Partition Type
- In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions.
- Each partition processes approximately the same number of rows.
- Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions.2. Pass-Through Partition Type
- In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions.
- All rows in a single partition stay in that partition after crossing a pass-Through partition point.
- Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.3. Database Partitioning Partition Type
- Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.
- Use any number of pipeline partitions and any number of database partitions.
- We can improve performance when the number of pipeline partitions equals the number of database partitions.Database Partitioning with One SourceWhen we use database partitioning with a source qualifier with one source, the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally.For example, when a session has three partitions and the database has five partitions, 1st and 2nd session partitions will receive data from 2 database partitions each. Thus four DB partitions used. 3rd Session partition will receive Data from the remaining 1 DB partition.Partitioning a Source Qualifier with Multiple Sources TablesThe Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions.If the session has three partitions and the database table has two partitions, one of the session partitions receives no data.4. Hash Auto-Keys Partition Type
- The Integration Service uses all grouped or sorted ports as a compound Partition key.
- Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations.5. Hash User-Keys Partition Type
- The Integration Service uses a hash function to group rows of data among Partitions.
- we define the number of ports to generate the partition key.
- we choose the ports that define the partition key .6. Key range Partition Type
- The Integration Service passes data to each partition depending on the Ranges we specify for each port.
- Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.
- Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.
0 comments: