Os presentamos una herramienta Open Source muy interesante de los desarrolladores de Lyft (el blablacar americano): Amundsen.io. Os contamos lo que puede hacer
Descubrir datos de confianza
Busca datos dentro de la organización mediante una simple búsqueda de texto. Un algoritmo de búsqueda inspirado en PageRank recomienda resultados basados en nombres, descripciones, etiquetas y actividad de consulta/visualización en la tabla/cuadro de mandos
Ver metadatos automatizados
Fomenta la confianza en los datos utilizando metadatos automatizados: descripciones de cuadros y columnas, otros usuarios frecuentes, cuándo se actualizó el cuadro por última vez, estadísticas, una vista previa de los datos si se permite, etc. Facilitar el triaje vinculando el trabajo de ETL y el código que generó los datos
Compartir el contexto con los compañeros de trabajo
Actualiza las tablas y columnas con descripciones, reduce las innecesarias idas y venidas sobre qué tabla usar y qué contiene una columna.
Aprende de los demás
Mira los datos que los compañeros de trabajo usan frecuentemente, poseen o han marcado. Aprende cómo se ven las consultas más comunes de una mesa viendo los tableros construidos en una mesa determinada.
Descargar y Mas Info
Getting Started
Please visit the Amundsen installation documentation for a quick start to bootstrap a default version of Amundsen with dummy data.
Architecture Overview
Please visit Architecture for Amundsen architecture overview.
Supported Entities
- Tables (from Databases)
- People (from HR systems)
- Dashboards
Supported Integrations
Table Connectors
- Amazon Athena
- Amazon Glue and anything built over it (like Databricks Delta - which is a work in progress).
- Amazon Redshift
- Apache Cassandra
- Apache Druid
- Apache Hive
- CSV
- Delta Lake
- Google BigQuery
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle (through dbapi or sql_alchemy)
- PostgreSQL
- Presto
- Snowflake
Amundsen can also connect to any database that provides dbapi
or sql_alchemy
interface (which most DBs provide).
Dashboard Connectors
ETL Orchestration
BI Viz Tool
Installation
Please visit Installation guideline on how to install Amundsen.
Roadmap
Please visit Roadmap if you are interested in Amundsen upcoming roadmap items.
Blog Posts and Interviews
- Amundsen - Lyft's data discovery & metadata engine (April 2019)
- Software Engineering Daily podcast on Amundsen (April 2019)
- How Lyft Drives Data Discovery (July 2019)
- Data Engineering podcast on Solving Data Discovery At Lyft (Aug 2019)
- Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Oct 2019)
- Adding Data Quality into Amundsen with Programmatic Descriptions by Sam Shuster from Edmunds.com (May 2020)
- Facilitating Data discovery with Apache Atlas and Amundsen by Mariusz Górski from ING (June 2020)
- Using Amundsen to Support User Privacy via Metadata Collection at Square by Alyssa Ransbury from Square (July 14, 2020)
- Amundsen Joins LF AI as New Incubation Project (Aug 11, 2020)
- Amundsen: one year later (Oct 6, 2020)
Talks
- Disrupting Data Discovery {slides, recording} (Strata SF, March 2019)
- Amundsen: A Data Discovery Platform from Lyft {slides} (Data Council SF, April 2019)
- Disrupting Data Discovery {slides} (Strata London, May 2019)
- ING Data Analytics Platform (Amundsen is mentioned) {slides, recording } (Kubecon Barcelona, May 2019)
- Disrupting Data Discovery {slides, recording} (Making Big Data Easy SF, May 2019)
- Disrupting Data Discovery {slides, recording} (Neo4j Graph Tour Santa Monica, September 2019)
- Disrupting Data Discovery {slides} (IDEAS SoCal AI & Data Science Conference, Oct 2019)
- Data Discovery with Amundsen by Gerard Toonstra from Coolblue {slides} and {talk} (BigData Vilnius 2019)
- Towards Enterprise Grade Data Discovery and Data Lineage with Apache Atlas and Amundsen by Verdan Mahmood and Marek Wiewiorka from ING {slides, talk} (Big Data Technology Warsaw Summit 2020)
- Airflow @ Lyft (which covers how we integrate Airflow and Amundsen) by Tao Feng {slides and website} (Airflow Summit 2020)
- Data DAGs with lineage for fun and for profit by Bolke de Bruin {website} (Airflow Summit 2020)
Related Articles
- How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions
- Data Discovery in 2020
- 4 Data Trends to Watch in 2020
- Work-Bench Snapshot: The Evolution of Data Discovery & Catalog
- Future of Data Engineering
- Governance and Discovery
- A Data Engineer’s Perspective On Data Democratization
- Graph Technology Landscape 2020
- In-house Data Discovery platforms
- Linux Foundation AI Foundation Landscape
- Lyft’s Amundsen: Data-Discovery with Built-In Trust
- How to find and organize your data from the command-line
- Data Discovery Platform at Bagelcode
- Cataloging Tools for Data Teams
- An Overview of Data Discovery Platforms and Open Source Solutions
- Hacking Data Discovery in AWS with Amundsen at SEEK
Community meetings
Community meetings are held on the first Thursday of every month at 9 AM Pacific, Noon Eastern, 6 PM Central European Time. Link to join
Upcoming meetings & notes
You can the exact date for the next meeting and the agenda a few weeks before the meeting in this doc.
Notes from all past meetings are available here.
Who uses Amundsen?
Here is the list of organizations that are using Amundsen today. If your organization uses Amundsen, please file a PR and update this list.
Currently officially using Amundsen:
- Asana
- Bagelcode
- Bang & Olufsen
- Brex
- Cameo
- Cimpress Technology
- Coles Group
- Convoy
- Data Sprints
- Dcard
- Devoted Health
- DHI Group
- Edmunds
- Everfi
- Gusto
- Hurb
- ING
- Instacart
- iRobot
- LMC
- Lyft
- Merlin
- PicPay
- PUBG
- Rapido
- REA Group
- Remitly
- Square
- WeTransfer
- Workday