Data Phantom Platform

A comprehensive data processing and analytics platform for managing SQL workflows across multiple engines with AWS EMR integration

5+
SQL Engines
AWS
Cloud Integration
DAG
Workflow Management
Scheduled Flow
Cron-based Execution
Adhoc Flow
On-demand Execution
Recovery Flow
Resume from Checkpoint

Powerful Features

Everything you need for enterprise-grade data processing and analytics

Multi-Engine SQL Support

Execute queries across Hive, Presto, Spark SQL, MySQL, and PySpark with seamless integration

  • Apache Hive for data warehousing
  • Presto for distributed queries
  • Spark SQL for big data processing
  • MySQL for direct database connections
  • PySpark for Python-based processing

Visual Workflow Management

Create and manage complex data pipelines with dependency tracking and visual DAG representation

  • Directed Acyclic Graph (DAG) support
  • Task dependency management
  • Parallel execution optimization
  • Visual workflow designer

Three Execution Flows

Comprehensive execution model with scheduled, on-demand, and fault-tolerant recovery

  • Scheduled Flow: Cron-based priority queue execution
  • Adhoc Flow: On-demand playground execution
  • Recovery Flow: Resume from checkpoint after failures
  • 5-minute auto-discovery of playground updates

Data Reconciliation

Intelligent S3-based data validation with adaptive algorithms based on file size

  • Exact Match: Files < 1MB for precise comparison
  • Bloom Filter: Files > 1MB for probabilistic matching
  • S3 Output Reading: Direct comparison from task outputs
  • User-defined reconciliation mappings

AWS Integration

Native AWS EMR and S3 integration for scalable cloud-based data processing

  • EMR cluster management
  • S3 data storage integration
  • CloudFormation stack support
  • Automatic scaling

UDF Support

Upload and manage custom User-Defined Functions with JAR file support

  • Custom JAR upload
  • Task-specific UDF assignment
  • Runtime registration
  • Function library management

Modern Tech Stack

Built with industry-leading technologies for performance and reliability

Java 11

Backend Development

Dropwizard 4.0.2

REST API Framework

React 18.3.1

Frontend Dashboard

MariaDB/MySQL

Database

AWS EMR & S3

Cloud Services

Maven

Build Tool

Quick Start

Get up and running with Data Phantom in minutes

1

Install Dependencies

Set up MariaDB and configure your environment

brew install mariadb
brew services start mariadb
2

Configure Database

Create database and run the DDL script

mysql -u root -p data_phantom < database.ddl
3

Build & Run

Build the application and start the server

mvn clean install
java -jar target/annihilator-data-phantom-1.0-SNAPSHOT.jar server config-dev.yml