From Big Data to Fast Data

Admin oct. 11, 2019 0

Muy buen articulo de Raul Estrada . Principales puntos:
1. Data acquisition: pipeline for performance
In this step, data enters the system from diverse sources. The key focus of this stage is performance, as this step impacts of how much data the whole system can receive at any given point in time.

Technologies
For this stage you should consider streaming APIs and messaging solutions like:
- Apache Kafka - open-source stream processing platform
- Akka Streams - open-source stream processing based on Akka
- Amazon Kinesis - Amazon data stream processing solution
- ActiveMQ - open-source message broker with a JMS client in Java
- RabbitMQ - open-source message broker with a JMS client in Erlang
- JBoss AMQ - lightweight MOM developed by JBoss
- Oracle Tuxedo - middleware message platform by Oracle
- Sonic MQ - messaging system platform by Sonic

For handling many of these key principles of data acquisition, the winner is Apache Kafka because it’s open source, focused on high-throughput, low-latency, and handles real-time data feeds.
2. Data storage: flexible experimentation leads to solutions
There are a lot of points of view for designing this layer, but all should consider two perspectives: logical (i.e. the model) and physical data storage. The key focus for this stage is "experimentation” and flexibility.

Technologies
For this stage consider distributed database storage solutions like:
- Apache Cassandra - distributed NoSQL DBMS
- Couchbase - NoSQL document-oriented database
- Amazon DynamoDB - fully managed proprietary NoSQL database
- Apache Hive - data warehouse built on Apache Hadoop
- Redis - distributed in-memory key-value store
- Riak - distributed NoSQL key-value data store
- Neo4J - graph database management system
- MariaDB - with Galera form a replication cluster based on MySQL
- MongoDB - cross-platform document-oriented database
- MemSQL - distributed in-memory SQL RDBMS

For handling many of key principles of data storage just explained, the most balanced option is Apache Cassandra . It is open source, distributed, NoSQL, and designed to handle large data across many commodity servers with no single point of failure.
3. Data processing: combining tools and approaches
Years ago, there was discussion about whether big data systems should be (modern) stream processing or (traditional) batch processing. Today we know the correct answer for fast data is that most systems must be hybrid — both batch and stream at the same time. The type of processing is now defined by the process itself, not by the tool. The key focus of this stage is "combination."

Technologies
For this stage, you should consider data processing solutions like:
- Apache Spark - engine for large-scale data processing
- Apache Flink - open-source stream processing framework
- Apache Storm - open-source distributed realtime computation system
- Apache Beam - open-source, unified model for batch and streaming data
- Tensorflow - open-source library for machine intelligence

For managing many of the key principles of data storage just explained, the winner is a tie between Spark (micro batching) and Flink (streaming).
4. Data visualization
Visualization communicates data or information by encoding it as visual objects in graphs, to clearly and efficiently get information to users. This stage is not easy; it’s both an art and a science.
Technologies

For this layer you should consider visualization solutions in these three categories:

Notebook reports: Apache Zeppelin and Jupyter notebooks
Charts, maps, and graphics: Tableau
Customized charts, maps, and graphics: D3.js and Gephi

LinceBI, la mejor solución Big Data Analytics basada en Open Source

Formación Data 2026 (más de 30 Cursos)

Checklist para elegir Arquitectura de Datos

Conceptos Fundamentales de Business Intelligence

Nuevo!! Data University

From Big Data to Fast Data

STDashboard, a free license way to create Dashboards

Comparacion de sistemas Open Source OLAP para Big Data

Libro gratuito: Trucos de PowerBI (5)

12 aplicaciones gratuitas para crear Dashboards

Groot AI LinceBI: la nueva plataforma Analytics AI Open Source

25 Consejos de un veterano para los que empiezan en Data

Curso Databricks Gratuito

Curso Snowflake Gratuito

Deepseek AI integrations

Cómo convertirse en especialista IA si vienes del mundo Data y BI

Curso Fabric Gratuito

Qué es una Arquitectura Medallón?

Diccionario de Arquitectura de Datos

50 Consejos de Visualización

17 KPIs para medir un proyecto de Data Governance

Como funciona el nuevo Fabric Data Agent

𝗔𝗿𝗾𝘂𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗮𝘀 𝗱𝗲 𝗗𝗮𝘁𝗼𝘀 (𝗔𝘇𝘂𝗿𝗲, 𝗔W𝗦, 𝗚𝗼𝗼𝗴𝗹𝗲 𝘆 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲), comparativa muy útil!!

50 sesgos cognitivos a considerar en Negocios

Cuales son y para que sirven las bases de datos de Grafos?

Diccionario de Arquitecturas de Datos

Comparativa Databrics vs Fabric vs Snowflake

Top Open Source Data Integration Tools

Como extraer y trabajar con los datos de SAP

Los 40 mejores libros de gestión, tecnología e innovación

30 Consejos y Buenas Prácticas para hacer un proyecto de Power BI con éxito

Cómo aplicar NoSQL en casos reales

Videotutorial: Trabajando con Python en Power BI