what are Data lakes? | Created to store “Big Data” – large volume of data that
has high velocity, high volume and high variety
• Stores a larger quantity of data than DW
• Pros of data lakes: Easily store a lot of data
• Cons: The business can become overwhelmed with the
data if not properly organized |
What are the 3Vs' that describe big data? | - volume
•Gigabytes, terabytes, petabytes and zettabytes
-variety
• Structured
– Numeric, character,
• Unstructured
– text, email, photos, voice, video
-velocity
•how fast it's being processed |
the definition of data mining? | The process of analyzing large amounts of data to discover
patterns, relationships, and trends to gain insights. |
examples of data mining ? | • Customer Relationship Management
– Identify customer preferences and buying patterns
– Most profitable customers
• Fraud Detection
– Identify unauthorized use of credit cards
• Advertising
– Stream targeted ads to online users based on their
browsing history and social media activity
• Retailing
– Predict accurate sales volumes at different locations |
how does data mining work? | data mining: using data builds models that discover patterns
1. associations: finds correlations in groups
2. predictions
3. clusters ( finds natural grouping of things)
4. sequential relations, finds time order events ( has checking account, will most likely open a savings account) |
what are some Data mining techniques ? | Description Models - Describe trends, patterns, and
relationships without making predictions.
– Exploratory in nature
– Clustering
– Association Rule Mining
– Outlier Detection
Prediction Models - Predict the future
– Regression Analysis
– Time Series |
what is the Data mining process : CRISP- DM | Cross Industry Standard Process for Data Mining (CRISP- DM) |