Data Phantom Platform

A comprehensive data processing and analytics platform for managing SQL workflows across multiple engines with AWS EMR integration

Get Started View on GitHub

SQL Engines

AWS

Cloud Integration

DAG

Workflow Management

Scheduled Flow

Cron-based Execution

Adhoc Flow

On-demand Execution

Recovery Flow

Resume from Checkpoint

Powerful Features

Everything you need for enterprise-grade data processing and analytics

Multi-Engine SQL Support

Execute queries across Hive, Presto, Spark SQL, MySQL, and PySpark with seamless integration

Apache Hive for data warehousing
Presto for distributed queries
Spark SQL for big data processing
MySQL for direct database connections
PySpark for Python-based processing

Visual Workflow Management

Create and manage complex data pipelines with dependency tracking and visual DAG representation

Directed Acyclic Graph (DAG) support
Task dependency management
Parallel execution optimization
Visual workflow designer

Three Execution Flows

Comprehensive execution model with scheduled, on-demand, and fault-tolerant recovery

Scheduled Flow: Cron-based priority queue execution
Adhoc Flow: On-demand playground execution
Recovery Flow: Resume from checkpoint after failures
5-minute auto-discovery of playground updates

Data Reconciliation

Intelligent S3-based data validation with adaptive algorithms based on file size

Exact Match: Files < 1MB for precise comparison
Bloom Filter: Files > 1MB for probabilistic matching
S3 Output Reading: Direct comparison from task outputs
User-defined reconciliation mappings

AWS Integration

Native AWS EMR and S3 integration for scalable cloud-based data processing

EMR cluster management
S3 data storage integration
CloudFormation stack support
Automatic scaling

UDF Support

Upload and manage custom User-Defined Functions with JAR file support

Custom JAR upload
Task-specific UDF assignment
Runtime registration
Function library management

Modern Tech Stack

Built with industry-leading technologies for performance and reliability

Java 11

Backend Development

Dropwizard 4.0.2

REST API Framework

React 18.3.1

Frontend Dashboard

MariaDB/MySQL

Database

AWS EMR & S3

Cloud Services

Maven

Build Tool

Quick Start

Get up and running with Data Phantom in minutes

Install Dependencies

Set up MariaDB and configure your environment

                            brew install mariadb
brew services start mariadb
                        

Configure Database

Create database and run the DDL script

mysql -u root -p data_phantom < database.ddl

Build & Run

Build the application and start the server

                            mvn clean install
java -jar target/annihilator-data-phantom-1.0-SNAPSHOT.jar server config-dev.yml