You are here

Informatica

Optimization of Data Warehouse – Reporting Design

Problem Definition:- As per any standard Data Warehouse Project the entities <Transactional DB Source, ETL(Extract, Transform & Load)  Process, Analytical DB Destination, UI for Reporting>  are being used in the current Process/Project. Before looking in to the problems in the current system design let us first see the current system design:

 Dataware house optimization

Current Design with ETL and User Reporting on Same Server
 
Following were some problems that exist with the above current system design:-
  1. ETL Performance: ETLs were taking longer time to Complete.
  2. Resource Conflict: ETLs and User Reporting were happening at same time at the same Server (OLAP Analytical Server in the above example) is affecting the ETL and Report execution performance because of conflict for same resource.
  3. Non-Flexible ETLs Schedule: ETLs can not be executed in Business hours because it was affecting the User Report performance.
  4. Users Reports Performance: As most of the Users are Redmond Users so ETLs were executed in Non- Redmond Hours but for the other regions Users the ETL execution was impacting their reports because the ETL and Reporting Server were same.
Solution Proposed: To overcome the above stated problems and for better performance of ETLs and Users reporting the following design change has been done: 
 
Phase 1 (Implemented):
New Design with ETL and User Reporting separated on different Server
 
As shown above, in the new design approach the ETL server and Reporting Server are now separated. The data movement from the ETL Server to the Reporting Server is through Replication. The replication job is scheduled to run for every 15 minutes. ETL Server is specifically used for ETL purpose only and Reporting Server is used for User reporting only. So following are the performance benefit after implementing the above design approach:-
 
  1. ETL Performance Gain: ETLs are now completing as expected. 
  2. No Resource Conflict: ETLs and User Reporting can now happen at same time because the ETL Server and Reporting Server are now different. So there is no Resource Conflict hence no performance issues. 
  3. Flexible ETLs Schedule: ETLs now can  be executed in Business hours also.
  4. Users Reports Performance Gain: The users are now not experiencing any query performance difference in between the regions. Irrespective of the Region the Reports/Queries execution time is same.
 Following graphs depicts the improvement in the ETL and Reporting Performance:
 
Reports Execution time improvement
 
 
 
ETL Execution time improvement
 
 
Phase 2 (Yet to Implement): Following design approach is proposed for Phase 2 approach. In this design the ETL source will be the Replicated OLTP rather then the main transactional Data base. The advantages with this approach will be:
  1. Independent ETL Source: No interfere with the Transactional Application load. The ETLs data pull will be from dedicated server that will get replicated data from OLTP Source.
  2. Indexes Creation:- In the Old design its very difficult to create any new index on the OLT P data base because it may impact the Transactional performance but in the new design new indexes can be created on the dedicated ETL Server that can help in enhancing the ETL data pull Performance. Because these indexes now won’t affect the Transactional performance.
  3. Performance Gain:-  As new replicated server will be a dedicated server for ETL pull data process so there will be less load on the server that in turns increase the performance for ETL Process.
 
 Proposed Design with Transactional DB and ETL Source separated on different Server
Informatica Code review checklist

Project Name/Code:

Name of the Programmer:

Mapping Name:

Supporting Documents :

Reviewer Name:

Review Date:

 

Number of Defects Identified:

Number of Defects Fixed:

 

MAPPING

 

S.NO.

DESCRIPTION

COMMENTS

YES/NO/NA

GENERAL

1

Is there any design requirements that is not mapped to the Code?

 

 

2

Are there appropriate COMMENTS provided for each transformation?

 

3

Are the Source, Target tables & any reusable transformation used, available in the SHARED folder and the appropriate shortcuts are used in the Mapping?

 

4

Are all the NAMING conventions for ports and transformations followed as described in the naming convention document?

 

5

Are all ports not required in passive transformations excluded in the transformation? (Example: Lookup, expression etc)

 

6

Are all the ports taken through the Active transformations?

 

7

Are all the new Changes made in program code reflected back in Design document and Unit test document? (Design and Testing Docs are in sync with the code)?

 

FLAT FILES (SOURCE)

8

Check for the FLAT FILE and the TABLE STRUCTURE from which it is derived (if used)?

 

9

Are the data types of all the PORTS in the source FLATFILE and TARGET similar?

 

SOURCE QUALIFIER TRANSFORMATION

10

Is the SQL override (if used) in the SQL qualifier valid?

                     

 

11

Are the output ports and the selection of fields in the SQL query of the SQL override specified (if used) in the same order?

 

12

Are there proper LINKS between all the required ports from the SOURCE to the SQL qualifier?

 

13

If multiple Source Qualifiers are used to connect to multiple targets, check if the Target Load Order is designated as required?

 

 

 

SEQUENCE GENERATORS

14

Are the NEXTVAL and the CURRVAL output ports linked correctly (If used)?

 

15

Are the START VALUE (= 0), CURRENT VALUE (= MAX NO.OF RECORDS IN TARGET) + 1) and INCREMENT BY fields in the Properties tab set to the required values?

 

LOOK UP TRANSFORMATION

16

Check for the LOCATION INFORMATION (properties) in the LOOK UP transformations.

 

17

Are the INPUT and OUTPUT ports of LOOKUP transformations properly linked?

 

18

For connected lookups, are the LOOKUP conditions for the LOOKUP transformations given properly?

 

 19

Is the Lookup OVERRIDE (if used) in the Lookup transformation valid?

 

20

In case of Unconnected lookups, is one of the ports designated as RETURN port?

 

21

If Dynamic cache is specified, check if the NewLookupRow Port is correctly used? The data should be flagged appropriately for inserting/updating/do nothing based on the NewLookupRow.

 

22

For Dynamic lookups, check if each of the lookup port (that is not a part of the lookup condition) is associated with an input port or sequence ID?

 

AGGREGATOR TRANSFORMATION

23

Is the GROUP BY ports for the AGGREGATOR transformations being specified appropriately (if used)?

 

24

In case the SORTED INPUT option is specified, is the data that enters the Aggregator is sorted on the Group by ports in the order they appear in the Aggregator transformation.

 

STORED PROCEDURE TRANSFORMATIONS

25

 

Is the CONNECTION Information specified correctly in the Stored Procedure transformation (if used)?

 

26

For Connected Stored Procedure transformations, check if the input and output PORTS are mapped correctly.

 

27

For Unconnected Stored Procedure transformations, check if it is configured to run pre or post session, or configured to run from another Expression transformation.

 

JOINER TRANSFORMATIONS

28

Is the Joining CONDITIONS specified for the Joiner transformation (if used)?

 

29

Is the TYPE of join (Normal, Master outer, and Detail outer or Full outer) specified correctly in the Joiner transformation?

 

FILTER TRANSFORMATIONS

30

Are the FILTER conditions for the FILTER transformations parsed successfully (if used)?

 

 

 

EXPRESSION TRANSFORMATIONS

31

Are the data types and field lengths of all the PORTS specified correctly (if used)?

 

 

32

Are the expressions in the expression transformation parsed successfully?

 

 

UPDATE STRATEGY

33

Is the condition for the UPDATE strategy specified (if used)?

 

34

Are the ports in UPDATE strategy and the TARGET table linked in the correct order?

 

TARGET

35

Check whether the source of all the input ports of TARGET is specified.

 

 

 

SESSION & WORKFLOW

 

S.NO.

DESCRIPTION

COMMENTS

YES/NO/NA

SESSION

1

Are the $Source & $Target connection value in the Properties tab pointing correctly to the SOURCE & TARGET Database respectively?

 

2

In case Mapping parameters are used, check if the PARAMETER file along with the full path is specified in the Properties tab?

 

3

Check if the Save Session Log by option is set to Session timestamp in the log options - config object tab?

 

4

In the source tab, check if the CONNECTION value for source(s) are correctly selected?

 

5

In the targets tab, check if the CONNECTION value for target(s) are correctly selected?

 

6

For Oracle sources/targets, in case the OWNER prefix is required to access the table, check if that is specified in the Table Name Prefix property.

 

7

For Oracle targets check if the Target Load type is set to NORMAL load in the Target Properties.

 

8

For Flat file sources, are the Source file directory, Source File name and File properties correctly specified?

 

9

For Flat file targets, are the Target File directory, target file name and file properties correctly specified?

 

10

In the transformation tab, is the LOCATION information for all the lookups specified correctly?

 

11

In the transformation tab, is the CONNECTION information for all the stored procedure transformations specified correctly?

 

12

If UNIX shell scripts are called in pre-session or post-session commands, check if the script name and the path in which the script exist is specified correctly.

 

13

If partitioning is used, check if all the properties in the Partition tab are specified correctly.

 

WORKFLOW

14

Is the Informatica server correctly selected in the Workflow?

 

Subscribe to Informatica