Saturday 11 June 2011

How do you perform incremental logic or Delta or CDC?


Incremental means suppose today  we processed 100 records ,for tomorrow  run u need to extract  whatever the records inserted newly and  updated after previous run  based on last updated timestamp (Yesterday run) this process called as incremental or delta.

Approach_1: Using set max var ()
1)First need to create mapping var ($$Pre_sess_max_upd)and assign initial value as old date (01/01/1940).
2)Then override source qualifier query to fetch only LAT_UPD_DATE >=$$Pre_sess_max_upd  (Mapping var)
3)In the expression assign max last_upd_date value to $$Pre_sess_max_upd(mapping var) using set max var
4)Because its var  so it stores the max  last upd_date value in the repository, in the next run  our source qualifier query will fetch only the records updated or inseted after previous run.

Approach_2: Using  parameter file
First need to create mapping  parameter ($$Pre_sess_start_tmst )and assign initial value as old date (01/01/1940) in the parameterfile.
2 Then override source qualifier query to fetch only LAT_UPD_DATE >=$$Pre_sess_start_tmst (Mapping var)
3 Update mapping parameter($$Pre_sess_start_tmst) values in the parameter file using shell script or another mapping after first session get completed successfully
4 Because its mapping parameter   so every time we need to update the  value in the  parameter file after comptetion of main session.

Approach_3: Using oracle Control tables
1     First we need to create  two control tables cont_tbl_1 and cont_tbl_1 with structure of session_st_time,wf_name
2   Then insert  one record  in each table with session_st_time=1/1/1940 and workflow_name
3       create two store procedures one for update cont_tbl_1 with session st_time, set  property of  store procedure type as Source_pre_load  .
4 In  2nd store procedure  set property of store procedure type as Target _Post_load.this proc will update the session _st_time  in Cont_tbl_2 from cnt_tbl_1.
5 Then override source qualifier query to fetch only LAT_UPD_DATE >=(Select  session_st_time from cont_tbl_2 where workflow name=’Actual work flow name’.
SCD Type-II Effective-Date Approach
·        We have one of the dimension in current project called resource dimension. Here we are maintaining the history to keep track of SCD changes.
·        To maintain the history in slowly changing dimension or resource dimension. We followed SCD Type-II Effective-Date approach.
·        My resource dimension structure would be eff-start-date, eff-end-date, s.k and source columns.
·        Whenever I do a insert into dimension I would populate eff-start-date with sysdate, eff-end-date with future date and s.k as a sequence number.
·        If the record already present in my dimension but there is change in the source data. In that case what I need to do is
Update the previous record eff-end-date with sysdate and insert as a new record with source data.

3 comments:

  1. hi i liked your blog can i get in touch with u by email etldevprav@gmail.com please provide me ur contact details

    ReplyDelete
  2. Really nice blog post.provided a helpful information.I hope that you will post more updates like this Informatica Online Training

    ReplyDelete
  3. Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.

    Big Data Consulting Services

    Data Lake Solutions

    Advanced Analytics Services

    Full Stack Development Solutions

    ReplyDelete