Part 2.
Support schemaless property graph model and openCypher query language
Graph-native, structure-aware columnar storage
Fast bulkloading with Modern HW capabilities such as SIMD
Vectorized, schemaless query execution engine
Top-down openCypher query optimizer based on the cascades optimization framework
Efficiently supports both graph and relational workloads
Efficient learning based on Intermediate Data Caching
Task-separated multi-process learning scheduler
Fast, low-overhead GDLL sampler that maximizes GPU utilization
Effective feature caching for distributed environment
Learning engine tightly connected to GDBMS
Supports graphical analysis of GDBMS
Supports various data migration methods between RDBMS and GDBMS, online or offline
Accelerated data migration based on multi-threading
Supports automatic transformation of relational data model to graph data model
We provide scripts and docker to help developers setup and install S62.
We provide a script for docker installation to avoid installing system dependencies required to build S62. Use the following commands:
The project consists of several modules using various platforms. We mainly use cmake as project builder, and ninja for fast incremental compilation.
You can see the system components and their dependencies using the following command.
To build specific target,
The list below explains directories in the project directory
github/ : Github automation
bin/ : binary files and shell scripts for executing the project
build/ : build configuration files
conf/ : system configuration file templates
data/ : data files for running toy examples
docker/ : docker related files
docs/ : design documents and user documents
examples/ : getting started examples
k8s/ : kubernetes files
licenses/ : licence files
tools/ : useful tools
test/ : test cases for system tests
tbgpp-*/ : source codes and test cases of each module
src : source codes (.cxx) for our native implementations (native namespace)
include : headers (.hxx) for our native implementations (native namespace)
libABC : heaviliy modified external libraries we depend on
third_party : directory for barely modified external libraries we depend on
MiT (Migration Toolkit) is a tool that automatically transforms relational DBs into graph DB models and then converts the relational data into vertices and edges
suitable for a graph DBMS, based on the transformed model.
MiT was developed based on the CUBRID Migration Toolkit [version 11.0.0.0002] (https://github.com/cubrid/cubrid-migration).
Connection and testing of Graph DBMS
Automatic model change and modification based on metadata
Various data migration based on the changed model (on-line, cypher file, csv file)
Migration report for data migration
MiT can be built in a Linux environment by downloading the source code and using the build script.
The user manual (https://hwany7seo.github.io/mit_manual/start.html) can be referred to for instructions on how to use MiT.
The build of MiT is currently supported only in a Linux environment. Centos 7 is recommended.
Since java and eclipse required for the build are included in com.cubrid.cubridmigration.build, there is no need for the user to set up the environment additionally. The versions of each program are as follows:
- java 1.7
- Eclipse IDE Helios R
CUBRID CMT
(https://github.com/CUBRID/cubrid-migration.git)
User Manual
(https://hwany7seo.github.io/mit_manual)
MiT is subject to the same license as CUBRID CMT.
Apache license 2.0
http://jira.iitp.cubrid.org/secure/Dashboard.jspa
You can receive support by leaving details on the above jira in case of bugs, improvements, or questions.
Apache license 2.0
LICENSE/ : A directory containing txt files of licenses for libraries, frameworks, etc., used in MiT
MiT_Manual/ : A directory where the manual on how to use MiT is written in rst files
MiT_docs/ : A directory containing documents related to the design of MiT
com.cubrid.common.configuration/ : Manages the execution, termination, class loading, etc.
com.cubrid.common.update.feature/ : Information about libraries and plugins to be updated
com.cubrid.common.update/ : Manages updates and update checks for MiT
com.cubrid.cubridmigration.app.feature/ : Configures the app's features
com.cubrid.cubridmigration.app.update.site/ : Stores URLs for fetching web information displayed upon MiT execution
com.cubrid.cubridmigration.app/ : The first application screen displayed when the program is run
com.cubrid.cubridmigration.build/ : Information and shell scripts for building the program
com.cubrid.cubridmigration.command/ : Project responsible for script migration
com.cubrid.cubridmigration.core.testfragment/ : Test codes for the core project
com.cubrid.cubridmigration.core/ : Handles the core features of MiT, such as migration and page navigation
com.cubrid.cubridmigration.plugin.feature/ : Configures the features of the plugin project
com.cubrid.cubridmigration.plugin.update.site/ : Saves URLs to connect to during plugin updates
com.cubrid.cubridmigration.plugin/ : Project for setting up the MiT plugins
com.cubrid.cubridmigration.ui.testfragment/ : Test codes for the UI project
com.cubrid.cubridmigration.ui/ : Project responsible for the UI of MiT
ViT (Visual Tool) is a program that allows you to query a graph DBMS in an interactive environment and visually express and analyze the results.
ViT was developed based on DBeaver v21.2.2 (https://github.com/dbeaver/dbeaver ), an open source tool that supports various DBMSs, using additional open
sources such as Gephi to visualize the results of graph DBMS.
Connection part: Graph DBMS connection and connection testing
Navigation part: Displays vertex and edge information of connected graph DB
Query window: Create and request queries of connected graph DB
Visualization window
Visualize query results as a graph (displaying vertices, edges, and labels)
Graph editing (moving, highlighting, changing properties, etc.)
Mini map
Change layout
Analysis functio ns (shortest path, etc.)
Development-related documents, etc. are managed in the ViT_docs folder.
ViT can be built through a build script after downloading the source code in a Linux environment.
For information on how to use Vit, please refer to the user manual (https://hwany7seo.github.io/vit_manual/start.html).
The build of MiT is currently supported only in a Linux environment. The programs required for building are as follows:
- JDK 11
- Apache Maven 3.8.6+
- Git
- Internet Connection
Dbeaver (https://github.com/dbeaver/dbeaver.git)
User Manual (https://hwany7seo.github.io/vit_manual)
Apache license 2.0
http://jira.iitp.cubrid.org/secure/Dashboard.jspa
You can receive support by leaving details on the above jira in case of bugs, improvements, or questions.
ViT_Manual/: ViT manual in rst format
ViT_docs/ : ViT design documents
bundles/ : basic plugins
docs/ : Original DBeaver documentation
features/: Used to structure the program’s plugins and dependencies
gephi-toolkit/ : Visualization library used to display graphs
plugins/: Original source, see DBeaver wiki for details (https://github.com/dbeaver/dbeaver/wiki/Develop-in-Eclipse)
product/: final program settings
test/ : Original DBeaver test code
Most of the graph data contains a lot of features or characteristics. Typically, graph neural network models are trained on GPUs for faster computation, which requires input features. Graph features are stored on the CPU and need to be loaded onto the GPU during model computation, which is time-consuming. Feature Caching on GPU aims to cache the required features on the GPU and use them during model training instead of loading them from the CPU to minimize the training time.
1. Input the target graph dataset with features in the mentioned file and run the python program for GNN model training.
2. After running the program, it will cache the features on GPU and expedite the GNN training.
1. Install Python compiler.
2. Install prerequisite libraries for compiling the program.
3. Input the target graph dataset into the desired folder.
4. Compile the program through command line.
5. Get the expedite GNN training result.
The GPU Accelerator accelerates the GNN training by utilizing GPUs for generating layered dependent samples as well as training GNN over GPUs.
It consists of two modules, i.e., GPU sampler and a Pipeline for GNN training.
Layered Dependent Importance Sampler
Hybrid CPU-GPU training Pipeline
Multi-processing feature for CPU and GPU training
Multi-threading feature gradient calculation, gradient accumulation, and model updates.
GPU_Accelerator: This is the main script that provides CPU-GPU training Pipeline for GNN.
Sample_Generator: Generates samples from graph for input nodes.
Layered_dependent_samples: Generates layered dependent samples for given depth
Sample_Consumer: Consumes samples by training GNN over GPUs.
Gradient_Accumulation: Accumulate gradient over world size(number of GPUs).
GPU training pipeline:
1. Provide a graph name, number of epochs, number of GPUs, and a batch size
2. Run using the command “python GPU_Accelerator.py”
Layered Dependent Samples:
1. Provide a graph name, fanout and number of layers of GNN
2. Run using the command “python layered_depdent_sampler.py”
This project is supported by IITP grant funded by the Korea government(MSIT) (No.2021-0-00859, Graph DBMS
for intelligent processing of big graphs)