Tools of the Trade
Talend is an open-source software vendor that provides big data integration, master data management solutions, and enterprise application integration. As the first integration platform built on Spark, Talend gives customers up to 100X better performance than any other platform on the market.
Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously transforms it, and then sends it to your favorite “stash.”
Apache Kafka® is a distributed streaming platform and is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Pivotal Greenplum is the world’s first fully-featured, multi-cloud, massively parallel processing (MPP) data platform based on the open source Greenplum Database. Pivotal Greenplum provides comprehensive and integrated analytics on multi-structured data. Powered by one of the world’s most advanced cost-based query optimizers, Pivotal Greenplum delivers unmatched analytical query performance on massive volumes of data.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
PostgreSQL is an object-related database management system. With over 15 years of constant development its known for its reliability and power and is the world's most advanced open source database. Its primary functions are to store secure data, follow best practices, and allow easy retrieval for other software applications.
Microsoft SQL Server
The foundation of Microsoft’s comprehensive data platform, SQL Server delivers breakthrough performance for mission-critical applications, using in-memory technologies, faster insights from any data to any user in familiar tools like Excel, and a resilient platform for building, deploying, and managing solutions that span on-premises and cloud.
Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected
The R Project
R is an open programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls and surveys of data miners are showing R's popularity has increased substantially in recent years
Python is an interpreted high-level programming language for general-purpose programming. It lets you work more quickly and integrate your systems more effectively.
The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage.
Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.
Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch indices. Kibana makes it easy to understand large volumes of data. Its simple, browser-based interface enables you to quickly create and share dynamic dashboards that display changes to Elasticsearch queries in real time.
Tableau Software helps people see and understand data. Tableau helps anyone quickly analyze, visualize and share information. More than 21,000 customer accounts get rapid results with Tableau in the office and on-the-go. And tens of thousands of people use Tableau Public to share data in their blogs and websites..
Jaspersoft allows users to make data-driven decisions inside their currently used apps and business programs. It focuses on individual needs and provides an easy to use platform that scales economically and architecturally to reach a larger audience.