Pentaho Announces Pentaho Data Integration 3.0

Pentaho Corp., creator of the world’s most popular open source business intelligence (BI) suite, today announced the availability of Pentaho Data Integration 3.0. The new release of Pentaho’s data integration product delivers significant performance enhancements, updated interfaces to improve ETL developer productivity, and a host of functional enhancements fueled by the company’s large and growing developer community.

Prior releases of Pentaho Data Integration have provided parallel execution to allow the product to address customer environments with very large data volumes. The new release adds support for dynamic cluster schemas designed for grid computing environments that allow large data loads to be deployed onto clusters of slave machines, easily adding or removing slaves based on load volumes.

For the increasing number of organizations that are using Pentaho Data Integration in mainframe and other non-relational environments, Pentaho Data Integration 3.0 includes optimized processing of flat files. New algorithms as well as parallel, non-blocking I/O can provide a five-fold increase in data loading performance from text and CSV files.

Also new are a series of upgrades that overcome the limitations of traditional “code generator” integration tools that produce brittle, black-box scripts and code that can be difficult or impossible to troubleshoot interactively in a graphical environment. Pentaho Data Integration 3.0 uses a metadata-driven approach to solve the problem. It also adds an integrated debugger designed to improve ETL developer productivity by providing conditional breakpoints in transformation execution, the ability to pause and resume transformation execution, and the ability to specify the number of rows to be used in test executions. Multi-developer environments can now benefit from integration with standard change management and source control systems through integration with the Apache Virtual File System in the new release.

“Pentaho Data Integration 3.0 demonstrates the continued maturation of enterprise open source data warehousing products,” said David Stodder, VP and Research Director at Ventana Research. “Open source data integration offerings like Pentaho’s provide compelling options for organizations seeking alternatives to traditional proprietary data integration tools.”

Pentaho Data Integration 3.0 delivers a new statistical transformation plug-in that allows organizations to enrich their data during the loading process with calculations like arithmetic means and medians, standard deviations, and percentile calculations. It also adds new data sources including Sybase IQ, BMC Remedy AR System, LDAP directories, and Microsoft Access, plus new job steps including file copy and delete, FTP to remote destination, Unzip packed files, XML document verification, and customizable logging. These and other features have been contributed by Pentaho’s community of developers around the world.

“The new statistical transformations in Pentaho Data Integration 3.0 can allow us to efficiently derive greater insight and value from our information within our standard data warehouse maintenance processes,” said Salvatore Scalisi, Director of Business Intelligence at ZipRealty.

Integrated data cleansing, data profiling, and connectors to many popular packaged applications are available from Pentaho and select Pentaho Partners.

“We’ve put a lot of time and effort into the upgraded architecture of Pentaho Data Integration version 3,” said Matt Casters, chief architect for Data Integration at Pentaho. “The new architecture delivers dramatic performance improvements, and gives us a great platform to rapidly deliver other enhancements down the road.”

Subscription services for Pentaho Data Integration 3.0 including professional support, certified versions, and IP indemnification are all available immediately from Pentaho. Training courses and consulting services are also available from Pentaho and select Pentaho Certified Partners.

www.pentaho.com

Open Source / GPL