What is a Data Warehouse? | - A data warehouse is a repository of data in a form that
can be easily accessed to create reports and answer
queries.
- used for OLAP, data mining, reporting |
Advantage of data warehouses | - provide a large store of data
- doesn't slow down OLTP systems
- consistent version of the truth
- allows ad-hoc reporting ( trend analysis, comparisons) |
disadvantage of data warehouses | - time consuming and expensive to implement and maintain |
Characteristics of DWs | Subject oriented
-Data are organized by detailed subject or area of interest,
such as sales, products, or customers, containing only
information relevant for decision support.
Integrated
-Data warehouses must place data from different sources
into a consistent format
time variant (time series)
-maintains historical data
nonvolatile
- previous data isn't erased |
data warehousing process | - collect the data ---> store data -----> access the data |
what is a data mart ? | - A DM is a subset of a data warehouse, typically
consisting of a single subject area (e.g., marketing,
operations).
ex. departments |
what is Extraction, transformation, and Loading (ETL)? | ETL is the process of populating the data warehouse from one
or multiple sources.
– Copy data from its source(s) to the DW.
Extraction: is the process of identifying data sources and
copying the data required for analysis.
– Data copied to the staging area
Transformation: is the process of mapping and harmonizing the
data that is, making certain the data are consistent,
cleansed, and reliable—from their sources to the targets.
– Convert the data to the required form
Loading: is moving data from the staging area to the DW |
what is a star schema? | The star schema is the most commonly used and the simplest
style of dimensional modeling |
STAR SCHEMA VS SNOWFLAKE | -a snowflake schema is an extension of a star schema
-Snow flake is more in-depth |