Home
API Documentation
Getting Started
Repository
The first module is the deep learning model graph generator, which has two expression languages: a $sequentialGraph$ API designed to automatize the hyperparameter search and a $customGraph$ which support the TensorFlow expression codes for sophisticated neural networks. The second module is the data manager compose by three classes designed for splitting, batching and multi-task any dataset over diverse computational architecture systems. The third is the platform execution module create to automatize the different parallel methods for training the neural networks over the platforms. The fourth module extends the enerGyPU monitor for workload characterization, constitute by a data capture in runtime to collect the convergence tracking logs and the computing factor metrics; and a dashboard for the experimental analysis results.
DiagnoseNET is designed into independent and interchangeable modules to exploit the computational resources on two levels of parallel and distributed processing. The first level management and synchronize the data parallelism for mini-batch learning between the workers, while the second level adjust the task granularity (model dimension and batch partition) according to the computational platform characteristics (memory capacity, number of CPUs, GPUs, the GPU micro-architecture, clocks frequency and among of others), as shown in the next schema.
The cross-platform library contains a task-based programming interface module for building the DNN model graphs, in which the developers design and parameterize a pre-build neural network family as fully-connected, stacked encoder-decoder and among others. The second module is called platform execution modes to select the computational platform for training the DNN model graph and hides the complexity posed by the heterogeneity in the computing platforms. Another module integrates a data and resource manager for training the DNN model graph, over CPU-GPU desktop machines, on multi-GPU nodes or in the embedded computation cluster of Jetson TX2. And the last module integrated an energy-monitoring tool called enerGyPU for workload characterization, which collects the energy consumption metrics while the DNN model is executed on the target platform.