The staging area can support hosting of data to be processed on independent schedules, and data that is meant to be directed to multiple targets. By copying the source data from the source systems and waiting to perform intensive processing and transformation in the staging area, the ETL process exercises a great degree of control over concurrency issues during processing. The former method takes advantage of technical efficiencies, such as data streaming technologies, reduced overhead through minimizing the need to break and re-establish connections to source systems and optimization of concurrency lock management on multi-user source systems. Copying required data from source systems to the staging area in one shot is often more efficient than retrieving individual records (or small sets of records) on a one-off basis. The staging area and ETL processes it supports are often designed with a goal of minimizing contention within source systems. Data alignment in the staging area is a function closely related to, and acting in support of, master data management capabilities. It is common to tag data in the staging area with additional metadata indicating the source of origin and timestamps indicating when the data was placed in the staging area.Īligning data includes standardization of reference data across multiple source systems and validation of relationships between records and data elements from different sources. In performing this function the staging area acts as a large "bucket" in which data from multiple source systems can be temporarily placed for further processing. One of the primary functions performed by a staging area is consolidation of data from multiple source systems. The functions of the staging area include the following: Staging areas can be designed to provide many benefits, but the primary motivations for their use are to increase efficiency of ETL processes, ensure data integrity and support data quality operations. Though the source systems and target systems supported by ETL processes are often relational databases, the staging areas that sit between data sources and targets need not also be relational databases. Staging area architectures range in complexity from a set of simple relational tables in a target database to self-contained database instances or file systems. Staging areas can be implemented in the form of tables in relational databases, text-based flat files (or XML files) stored in file systems or proprietary formatted binary files stored in file systems. A persistent staging area (PSA) is a type of staging area in a data warehouse which tracks the whole change history of a source table or query. There are staging area architectures, however, which are designed to hold data for extended periods of time for archival or troubleshooting purposes. Such a staging area is sometimes called a transient staging area (TSA). ĭata staging areas are often transient in nature, with their contents being erased prior to running an ETL process or immediately following successful completion of an ETL process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories. A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |