Por favor, leed este articulo . Es una joya para todos los que trabajan en Data Warehouse, Business Intelligence, Big Data
En TodoBI nos gusta decir que en los proyectos BI, DW son como un iceberg (la parte oculta es la mas grande e importante) y se corresponde con el ETL
Un extracto del artículo:
"ETL was born when numerous applications started to be used in the enterprise, roughly at the same time that ERP started being adopted at scale in the late 1980s and early 1990s"
Companies needed to combine the data from all of these applications into one repository (the data warehouse) through a process of Extraction, Transformation, and Loading. That’s the origin of ETL.
So, since these early days, ETL has essentially gotten out of control. It is not uncommon for a modest sized business to have a million lines of ETL code.
ETL jobs can be written in a programming language like Java, in Oracle’s PL/SQL or Teradata’s SQL, using platforms like Informatica, Talend, Pentaho, RedPoint, Ab Initio or dozens of others.
With respect to mastery of ETL, there are two kinds of companies :
- The ETL Masters, who have a well developed, documented, coherent approach to the ETL jobs they have
- The ETL Prisoners who are scared of the huge piles of ETL code that is crucial to running the business but which everyone is terrified to change.