Learn Informatica

HOW TO WORK IN PERFORMANCE TUNING

The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.

If you tune all the bottlenecks, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.

Because determining the best way to improve performance can be complex, change one variable at a time, and time the session both before and after the change. If session performance does not improve, you might want to return to the original configuration.

Complete the following tasks to improve session performance:

1. Optimize the target. Enables the Integration Service to write to the targets efficiently.

2. Optimize the source. Enables the Integration Service to read source data efficiently.

3. Optimize the mapping. Enables the Integration Service to transform and move data efficiently.

4. Optimize the transformation. Enables the Integration Service to process transformations in a mapping efficiently.

5. Optimize the session. Enables the Integration Service to run the session more quickly.

6. Optimize the grid deployments. Enables the Integration Service to run on a grid with optimal performance.

7. Optimize the Power Center components. Enables the Integration Service and Repository Service to function optimally.

8. Optimize the system. Enables Power Center service processes to run more quickly.

Partition Technique

A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.
When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.

A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in every pipeline stage.

PARTITIONING ATTRIBUTES

1. Partition points

By default, IS sets partition points at various transformations in the pipeline.
Partition points mark thread boundaries and divide the pipeline into stages.
A stage is a section of a pipeline between any two partition points.

2. Number of Partitions

we can define up to 64 partitions at any partition point in a pipeline.
When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.
increasing the number of partitions or partition points increases the number of threads.
The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.

3. Partition types

The Integration Service creates a default partition type at each partition point.
If we have the Partitioning option, we can change the partition type. This option is purchased separately.
The partition type controls how the Integration Service distributes data among partitions at partition points.

PARTITIONING TYPES

1. Round Robin Partition Type

In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions.
Each partition processes approximately the same number of rows.
Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions.

2. Pass-Through Partition Type

In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions.
All rows in a single partition stay in that partition after crossing a pass-Through partition point.
Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.

3. Database Partitioning Partition Type

Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.
Use any number of pipeline partitions and any number of database partitions.
We can improve performance when the number of pipeline partitions equals the number of database partitions.

Database Partitioning with One Source

When we use database partitioning with a source qualifier with one source, the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally.

For example, when a session has three partitions and the database has five partitions, 1^st and 2^nd session partitions will receive data from 2 database partitions each. Thus four DB partitions used. 3^rd Session partition will receive Data from the remaining 1 DB partition.

Partitioning a Source Qualifier with Multiple Sources Tables

The Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions.

If the session has three partitions and the database table has two partitions, one of the session partitions receives no data.

4. Hash Auto-Keys Partition Type

The Integration Service uses all grouped or sorted ports as a compound Partition key.
Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations.

5. Hash User-Keys Partition Type

The Integration Service uses a hash function to group rows of data among Partitions.
we define the number of ports to generate the partition key.
we choose the ports that define the partition key .

6. Key range Partition Type

We specify one or more ports to form a compound partition key.

The Integration Service passes data to each partition depending on the Ranges we specify for each port.
Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.
Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.

PUSH DOWN OPTIMISATION

You can push transformation logic to the source or target database using pushdown optimization. When you run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations.

The amount of transformation logic you can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.

Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.

The following figure shows a mapping containing transformation logic that can be pushed to the source database:

This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. To push the transformation logic to the database, the Integration Service generates the following SQL statement:

INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419 IS NULL THEN '' ELSE 5419 END) + '_' + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN '' ELSE ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2 ITEMS

The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and description values from the source table, create new item IDs, and insert the values into the ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store number 5419, an underscore, and the original ITEM ID to get the new item ID.

Pushdown Optimization Types

You can configure the following types of pushdown optimization:

Source-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the source database.
Target-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the target database.
Full pushdown optimization. The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization.

Running Source-Side Pushdown Optimization Sessions
When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database.
The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. Then, it reads the results of this SQL query and processes the remaining transformations.
Running Target-Side Pushdown Optimization Sessions
When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the target database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQL on the Target database.
Running Full Pushdown Optimization Sessions
To use full pushdown optimization, the source and target databases must be in the same relational database management system. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. It generates and executes SQL statements against the source or target based on the transformation logic it can push to the database.
When you run a session with large quantities of data and full pushdown optimization, the database server must run a long transaction. Consider the following database performance issues when you generate a long transaction:

A long transaction uses more database resources.
A long transaction locks the database for longer periods of time. This reduces database concurrency and increases the likelihood of deadlock.
A long transaction increases the likelihood of an unexpected event. To minimize database performance issues for long transactions, consider using source-side or target-side pushdown optimization.

Rules and Guidelines for Functions in Pushdown Optimization
Use the following rules and guidelines when pushing functions to a database:

If you use ADD_TO_DATE in transformation logic to change days, hours, minutes, or seconds, you cannot push the function to a Teradata database.
When you push LAST_DAY () to Oracle, Oracle returns the date up to the second. If the input date contains sub seconds, Oracle trims the date to the second.
When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the argument (' ') as NULL, but the Integration Service treats the argument (' ') as spaces.
An IBM DB2 database and the Integration Service produce different results for STDDEV and VARIANCE. IBM DB2 uses a different algorithm than other databases to calculate STDDEV and VARIANCE.
When you push SYSDATE or SYSTIMESTAMP to the database, the database server returns the timestamp in the time zone of the database server, not the Integration Service.
If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and you specify the format for SYSTIMESTAMP, the database ignores the format and returns the complete time stamp.
You can push SYSTIMESTAMP (‘SS’) to a Netezza database, but not SYSTIMESTAMP (‘MS’) or SYSTIMESTAMP (‘US’).
When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with sub second precision must be in the YYYY-MM-DD HH24: MI: SS.US format. If the format is different, the Integration Service does not push the function to Netezza.

Mapping Optimization Techniques

1. Reduce the number of transformations. There is always overhead involved in moving data between transformations.

2. Consider more shared memory for large number of transformations. Session shared memory between 12MB and 40MB should suffice.

3. Calculate once, use many times.

Avoid calculating or testing the same value over and over.

Calculate it once in an expression, and set a True/False flag.

Within an expression, use variable ports to calculate a value than can be used multiple times within that transformation.

Delete unnecessary links between transformations to minimize the amount of data moved, particularly in the Source Qualifier.

This is also helpful for maintenance. If a transformation needs to be reconnected, it is best to only have necessary ports set as input and output to reconnect.

In lookup transformations, change unused ports to be neither input nor output. This makes the transformations cleaner looking. It also makes the generated SQL override as small as possible, which cuts down on the amount of cache necessary and thereby improves performance.

The engine automatically converts compatible types.

Sometimes data conversion is excessive. Data types are automatically converted when types are different between connected ports. Minimize data type changes between transformations by planning data flow prior to developing the mapping.

Plan for reusable transformations upfront.

Use variables. Use both mapping variables as well as ports that are variables. Variable ports are especially beneficial when they can be used to calculate a complex expression or perform a disconnected lookup call only once instead of multiple times

Use mapplets to encapsulate multiple reusable transformations.

Use mapplets to leverage the work of critical developers and minimize mistakes when performing similar functions.

Reduce the number of non-essential records that are passed through the entire mapping.

Use active transformations that reduce the number of records as early in the mapping as possible (i.e., placing filters, aggregators as close to source as possible).

Select appropriate driving/master table while using joins. The table with the lesser number of rows should be the driving/master table for a faster join.

Redesign mappings to utilize one Source Qualifier to populate multiple targets. This way the server reads this source only once. If you have different Source Qualifiers for the same source (e.g., one for delete and one for update/insert), the server reads the source for each Source Qualifier.

Remove or reduce field-level stored procedures.

If you use field-level stored procedures, the PowerCenter server has to make a call to that stored procedure for every row, slowing performance.

4. Only connect what is used.

5. Watch the data types.

6. Facilitate reuse.

7. Only manipulate data that needs to be moved and transformed.

8. Utilize single-pass reads.

9. Sort the input data before passing to Joiner and Aggregate transformation.

10. In Lookup using customize query instead of default query. (Use '--' to overwrite lookup default order by clause).

11. Avoid using un-neccessary columns/port in sql query.

12. Filter un-neccessary data as closer to the source qualifier. (In case of Relational database include filter condition to the sql query).

13. In Joiner consider lesser value of data as Master Table.

14. In-case of mapping partition place aggregate transformation before the partition point.

15. Use Router instead of having multiple Filter transformations.

Learn Informatica

Wednesday, July 9, 2014

0 comments:

Text Widget

Labels