Data Ingestion and Storage

Types of data

Properties of data

Data Warehouse & Lake

Data warehouse Data Lake
centralized repo optimized for analytics where data from sources stored in structured format storage repo holds vast amount of raw data in native formats (various strucutures)
Schema on write Schema on read
Primarily structured data Various data structures
Less agile due to predefined schema More agile
ETL ELT or just load for storage purpose
more expensive due to optimizations for complex queries cost-effective for storage solutions but costs rise when volume increases
Have structured data sources and require fast and complex queries Mix of structures
Data integration from diff sources are essential Scalable and cost-effective
BI and analytics Future needs are uncertain
Advanced analytics, ML or data discovery are goals

Data Mesh

ETL