TodoBI - Business Intelligence, Big Data, ML y AI TodoBI - Business Intelligence, Big Data, ML y AI

Novedades en Pentaho 7.1

Como os hemos venido informando en nuestra cuenta de twitter , esta pasada semana se ha presentado la versión Pentaho 7.1
- Pentaho 7.1 en Github
- Pentaho 7.1 en Sourceforge
Os contamos las novedades y os pasamos los enlaces más interesantes:
- Novedades en Pentaho 7.1
- Descripción de mejoras por Pedro Alves

- Descripción de mejoras por Diethard Steiner

- Descripción de mejoras por Hemal Govind  

   Create Once, Execute on Any Engine, Starting with Spark

With adaptive execution on Spark in a visual environment, Pentaho 7.1 makes big data developers more productive and Spark more accessible to non-developers. Users can now create data integration logic one time, and then choose the most appropriate big data processing engine for each workload at run-time. This release starts with Spark, but can easily support other engines in the future.

  • Complete Spark Support:  Pentaho is the only vendor to support Spark with all data integration steps in a visual drag-and-drop environment. Unlike other vendors who require users to build Spark-specific data integration logic – and often require Java development skills – with Pentaho you only need to design your logic once, regardless of execution engine.
  • Adaptive Execution on Big Data:  Transitioning from one engine for big data processing to another often means users need to re-write and debug their data integration logic for each engine, which takes time. Pentaho’s adaptive execution allows users to match workloads with the most appropriate processing engine, without having to re-write any data integration logic.


Building on current cloud support for Amazon EMR, Pentaho 7.1 supports Microsoft Azure HDInsight, Azure SQL, and SQL Server in Azure VM, offering more options to store – and more importantly, process – big data in hybrid, on-premises, and public cloud environments.
  • Support for HDInsight:  Organizations using Microsoft Azure HDInsight can now use Pentaho to acquire, blend, cleanse and analyze diverse data at scale.
  • P rocess Data in the Cloud or  On-Premises:  Most vendors only allow you to access data from cloud sources. With Pentaho 7.1, you can also choose to process data on-premises, in the cloud or using a hybrid approach.


Pentaho 7.1 speeds up time to insight by allowing users to access visualizations at every step of the data prep process. In addition, simplified integration of third party visualizations drives improved analytics along the entire data pipeline. 
  • Prepare Better Data, Faster:  More visualizations throughout the data prep process allows users to spot check data for quality issues and prototype analytic data, without switching in and out of tools or waiting until the very end to discover data quality problems. Now, users can interact with heat grids, geo maps, and sunbursts, as well as drill-down into data sets for further exploration.   
  • Integrate 3rd Party Visualizations:  Leverage an easy to use and flexible API with full documentation to integrate visualizations from third party libraries such as D3 or FusionCharts.


Concerns over the lack of comprehensive security and authentication for big data environments are top of mind for IT organizations. Pentaho 7.1 gives customers more options by expanding on existing enterprise-level Hadoop security for Cloudera with a similar level of security for Hortonworks.
  • Kerberos Impersonation Support:  Address authentication vulnerabilities with Hortonworks deployments. Protect clusters from intrusion and reduce risk with enterprise-level security.
  • Apache Ranger Support:  Control role-based access to specific data sets and applications for Hortonworks deployments. Manage governance and risk with authorization.