Sunday 12 June 2011

Workflow Execution Process


Pictorial Representation of Workflow execution:




  1. A PowerCenter Client request IS to start workflow
  2. IS starts ISP
  3. ISP consults LB to select node
  4. ISP starts DTM in node selected by LB
Integration Service (IS)
The key functions of IS are
  • Interpretation of the workflow and mapping metadata from the repository.
  • Execution of the instructions in the metadata
  • Manages the data from source system to target system within the memory and disk
The main three components of Integration Service which enable data movement are,
  • Integration Service Process
  • Load Balancer
  • Data Transformation Manager 
Integration Service Process (ISP) 
The Integration Service starts one or more Integration Service processes to run and monitor workflows. When we run a workflow, the ISP starts and locks the workflow, runs the workflow tasks, and starts the process to run sessions. The functions of the Integration Service Process are,
  • Locks and reads the workflow
  • Manages workflow scheduling, ie, maintains session dependency
  • Reads the workflow parameter file
  • Creates the workflow log
  • Runs workflow tasks and evaluates the conditional links
  • Starts the DTM process to run the session
  • Writes historical run information to the repository
  • Sends post-session emails
Load Balancer
The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks to a single node or across the nodes in a grid after performing a sequence of steps. Before understanding these steps we have to know about Resources, Resource Provision Thresholds, Dispatch mode and Service levels
  • Resources – we can configure the Integration Service to check the resources available on each node and match them with the resources required to run the task. For example, if a session uses an SAP source, the Load Balancer dispatches the session only to nodes where the SAP client is installed
  • Three Resource Provision Thresholds, The maximum number of runnable threads waiting for CPU resources on the node called Maximum CPU Run Queue Length. The maximum percentage of virtual memory allocated on the node relative to the total physical memory size called Maximum Memory %. The maximum number of running Session and Command tasks allowed for each Integration Service process running on the node called Maximum Processes
  • Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to available nodes in a round-robin fashion after checking the “Maximum Process” threshold. Metric-based: Checks all the three resource provision thresholds and dispatches tasks in round robin fashion. Adaptive: Checks all the three resource provision thresholds and also ranks nodes according to current CPU availability
  • Service Levels establishes priority among tasks that are waiting to be dispatched, the three components of service levels are Name, Dispatch Priority and Maximum dispatch wait time. “Maximum dispatch wait time” is the amount of time a task can wait in queue and this ensures no task waits forever
 A .Dispatching Tasks on a node
  1. The Load Balancer checks different resource provision thresholds on the node depending on the Dispatch mode set. If dispatching the task causes any threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later
  2. The Load Balancer dispatches all tasks to the node that runs the master Integration Service process
B. Dispatching Tasks on a grid,
  1. The Load Balancer verifies which nodes are currently running and enabled
  2. The Load Balancer identifies nodes that have the PowerCenter resources required by the tasks in the workflow
  3. The Load Balancer verifies that the resource provision thresholds on each candidate node are not exceeded. If dispatching the task causes a threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later
  4. The Load Balancer selects a node based on the dispatch mode 
Data Transformation Manager (DTM) Process
When the workflow reaches a session, the Integration Service Process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks:
  • Retrieves and validates session information from the repository.
  • Validates source and target code pages.
  • Verifies connection object permissions.
  • Performs pushdown optimization when the session is configured for pushdown optimization.
  • Adds partitions to the session when the session is configured for dynamic partitioning.
  • Expands the service process variables, session parameters, and mapping variables and parameters.
  • Creates the session log.
  • Runs pre-session shell commands, stored procedures, and SQL.
  • Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.
  • Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data
  • Runs post-session stored procedures, SQL, and shell commands and sends post-session email

After the session is complete, reports execution result to ISP

2 comments:

  1. If it had examples it would have been much more easier to understand. Thank you!!!

    ReplyDelete
  2. If it had examples it would have been much more easier to understand. Thank you!!!

    ReplyDelete