ETL jobs run every day to pull data from Transactional OLTP database Servers and Load Analytical OLAP warehouse data bases takes more time than its expected, Follwing are tips that will help you improve the ETL performance.
Following are the daily running ETLs along with the current (before applying Improvement Tips) timings:-
ETL Name | Time Taken(Min) Before Optimization |
ETL 1 | 132 (Avg. for Latest 15 ETL runs) |
ETL 2 | 462 (Avg. for Latest 15 ETL run) |
ETL 3 | 450-500 (Avg for Latest 15 ETLs.) |
Views Definition Optimization:- There were 3 main views used to get the data from the main source tables. The existing definition of the views was:-
- SELECT ACT.* FROM S_EVT_ACT ACT(NOLOCK)JOIN S_SRV_REQ SR(NOLOCK) ON ACT.SRA_SR_ID = SR.ROW_ID
- SELECT ACT.* FROM S_EVT_ACT ACT(NOLOCK)JOIN S_DOC_AGREE ITR(NOLOCK)ON ITR.ROW_ID = ACT.AGREEMENT_ID WHERE ITR.X_QUICK_TICKET_ROW_ID IS NOT NULL
- SELECT ACT.* FROM S_EVT_ACT ACT(NOLOCK)JOIN S_PROD_DEFECT IR(NOLOCK)ON ACT.SRA_DEFECT_ID = IR.ROW_ID
Here the S_EVT_ACT, S_SRV_REQ, S_DOC_AGREE and S_PROD_DEFECT are main source Transactional database tables having huge data. While pulling from these views, the task was taking almost 1 Hr for each view. (This each view is used in 3 different ETLs).
The reason for taking that much time was because of the join condition with main tables and for Optimizing those Views, Join condition between the main tables has been edited so that View should not query the S_SRV_REQ table for the View 1, S_DOC_AGREE for the view 2 and S_PROD_DEFECT for the view 3. So the views definition has been changed like following:-- SELECT ACT.* FROM S_EVT_ACT ACT(NOLOCK)WHERE ACT.SRA_SR_ID IS NOT NULL
- SELECT ACT.* FROM S_EVT_ACT ACT(NOLOCK)WHERE ACT.AGREEMENT_ID IS NOT NULL
- SELECT ACT.* FROM S_EVT_ACT ACT(NOLOCK) WHERE ACT.SRA_DEFECT_ID IS NOT NUL
above view definitions are doing look up in the same table rather then main table look up..
Results: After changing the views definition as shown above, it really has improved the performance. Now the data pull task is taking just 2-3 minutes against the 1 hour earlier time. So save of almost 50 minutes for each ETL.
- SSIS Lookup Task (Null Column Handling):- There was one lookup task as a process of ETL processing through SSIS package that was taking almost 4 hrs each day means for each incremental day processing records it was taking almost 4 hrs that was very abnormal and was affecting the reporting very badly. So while analyzing the task following issues has been found:-
- There was one column in the Source-Destination mapping that is not mapped with the source. So there was no data in this column and this column name was Country for this example.
- Look up SQL was like following:-
SELECT ROW_WID,PERSONNELNUMBER,COUNTRY FROM WC_PERBIGAREA_D(NOLOCK)
As Country or say any columns that is being used in the Look UP SQL is having NULL values then the Look Up SQL will always return the same no of records for each incremental run means As the comparison with NULL always return False so the Look Up will always iterate for all the records in the Incremental pull against the Null Country Record in the existing table and returns the Incremental multiplied Null values records. That’s why it was taking almost same 4 Hrs for each incremental run.
Solution:- Following changes has been for improving the Look Up task performance beasically the NULL values has been handled:
- Update Column(NULL):Updated the main WC_PERBIGAREA_D (NOLOCK) table for the COUNTRY column from NULL value to the relevant value by joining the tables to populate correct country value.
- Modified Lookup Mapping: Modified the LookUp target mapping to include COUNTRY (Missing Column) mapping also so that from this point onwards the Country values should not become NULL and Should not effect the performance.
Results: After doing the above changes, the task is now completing in 10-15 minutes. Its again almost 4 hrs save in ETL execution time.
- Dead Lock Prevention: “It’s always a best solution to prevent Deadlock to occur at very first place rather then letting Deadlock occur and then recovering from Deadlock”. In one of the SSIS package task, there was truncation of main destination table and then loading fresh full data each time means each ETL run. After analyzing what we observed was that at the time of Truncation of this particular table, as truncation needs exclusive lock, if any user is querying that table then it is getting in to dead lock/Hanged state until unless that blocking is cleared manually. So due to this, ETL some times hanged for 2-3 hrs or even a day also. It was really impacting the daily ETL Performance.
Solution:
- Created new temporary table having same schema definition as main destination table that is being truncated in each ETL run.
- In each incremental run, truncated only the temporary table and loaded data into the temporary table. So deadlock/blocking is prevented by not truncating the main table.
- And for the data insertion from temporary table to the Main table followed the UPDATE/INSERT strategy. This way there was no hanging of ETL because now table is not exclusively locked and no user queries are affected and the ETL start running faster as there is no hanging point now. And the performance is also consistent.
Results:
- ETL’s Performance improved because there are no blocking points.
- Users Queries are not affected.
- Consistent ETL Performance.
Following are the daily running ETLs timing after applying Improvement Tips:
ETL Name
Time Taken(Minutes) After Optimization
Solution Applied
ETL 1
68(Avg. for 15 ETL runs)
Solution 1.
ETL 2
151 (Avg. for 15 ETL run)
Solution 1 & 3.
ETL 3
200 (Avg. for 15 ETL runs )
Solution 1 & 2.
ETL 1 Performance Graph after Applying Solution 1ETL 2 Performance Graph after Applying Solution 1 & 3
ETL 3 Performance Graph after Applying Solution 1 & 2
- SSIS Lookup Task (Null Column Handling):- There was one lookup task as a process of ETL processing through SSIS package that was taking almost 4 hrs each day means for each incremental day processing records it was taking almost 4 hrs that was very abnormal and was affecting the reporting very badly. So while analyzing the task following issues has been found:-