Beatmup
model.h
Go to the documentation of this file.
1 /*
2  Beatmup image and signal processing library
3  Copyright (C) 2020, lnstadrum
4 
5  This program is free software: you can redistribute it and/or modify
6  it under the terms of the GNU General Public License as published by
7  the Free Software Foundation, either version 3 of the License, or
8  (at your option) any later version.
9 
10  This program is distributed in the hope that it will be useful,
11  but WITHOUT ANY WARRANTY; without even the implied warranty of
12  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13  GNU General Public License for more details.
14 
15  You should have received a copy of the GNU General Public License
16  along with this program. If not, see <http://www.gnu.org/licenses/>.
17 */
18 
19 #pragma once
20 
21 #include "operation.h"
22 #include "storage.h"
23 #include "../bitmap/internal_bitmap.h"
24 #include "../gpu/program_bank.h"
25 #include "../gpu/linear_mapping.h"
26 #include "../context.h"
27 #include "../utils/progress_tracking.h"
28 #include "../utils/profiler.h"
29 #include "../utils/chunkfile.h"
30 #include "../utils/listing.h"
31 #include <vector>
32 #include <initializer_list>
33 
34 
35 namespace Beatmup {
36  /**
37  \page NNetsModuleOverview NNets module overview
38  %Beatmup provides a way to run inference of user-defined neural networks on GPU using OpenGL.
39 
40  The neural network (a NNets::Model instance) can be built in one of two ways:
41  - layer-by-layer in the user code, by adding instances of NNets::AbstractOperation,
42  - loading a model using NNets::DeserializedModel from a Yaml-like text description (see \subpage NNetsModelSerialization).
43 
44  The model data (e.g., convolution filters values) is stored in a ChunkCollection as plain single precision floating point arrays. They are
45  indexed using the operation names.
46  The model instance and input/output containers are supplied to NNets::InferenceTask which can be run in a thread pool of a Context, just
47  as another AbstractTask.
48 
49  Under the hood, the network is converted into a set of OpenGL ES 2.0-compliant GLSL shaders. The data is stored in textures in GPU memory.
50  %Beatmup takes care of building and executing shader programs.
51 
52  With this %Beatmup enables hardware-accelerated inference on any decent GPU, keeping the CPU available for other tasks. It allows to deploy
53  easily the same model on various hardware, including inexpensive single-board computers, %Android GPUs, integrated and discrete desktop GPUs
54  from any vendor.
55 
56  However, NNets module is still quite young and comes with a set of limitations.
57  - The set of implemented features is limited. So far it is oriented to image classification and feature extraction exclusively. See
58  NNets::AbstractOperation subclasses for the list of implemented neural network operations.
59  - Not any model can be transformed into a %Beatmup-compliant model. Most likely, a model needs to be designed and trained from scratch to be
60  deployed with %Beatmup. See NNets::Conv2D, NNets::Pooling2D and NNets::Dense operations descriptions for their corresponding constraints.
61  - OpenGL may introduce a significant overhead. The inference thoughput achievable with %Beatmup on powerful desktop GPUs is much likely
62  limited compared to what can be achieved with vendor-specific proprietary technologies widely used for training and inference.
63  - There are constraints related to the OpenGL ES 2.0 backend.
64  - The activations of almost all operations are stored as 8-bit integers. This may require the training to be somehow aware of the
65  activations quantization, otherwise with the increasing depth the error due to the quantization may cause performance degradation.
66  However, the weights of the network are usually not quantized:
67  - Conv2D filters and biases are stored in a floating point format. Possible quantization may apply if a given GPU does not support the
68  single precision floating point computations.
69  - Dense layers matrices and bias vectors are stored in floating point format if the GPU is OpenGL ES 3.1-compliant. Otherwise, a 16 bits
70  fixed point representation is used.
71  - The 8-bit sampled activations cover [0, 1] range. This strongly limits the activation functions that can be used in the model.
72  - OpenGL may be inefficient to sample many feature channels at the same time or have hardware or driver-defined hard limit on the number
73  of samples per output value (the latter is the case for Raspberry Pi). This constraints the width of the network. To overcome this,
74  group convolutions and \ref NNetsShufflingExplained "channel shuffling" are suggested. The latter allows to shuffle channels between
75  layers literally for free, which helps to increase the connectivity across the width of the network for group convolutions in
76  particular.
77  - The batch size is fundamentally and unconditionally equal to 1, i.e., the inference is run for one given input image at a time.
78  */
79 
80  /**
81  Neural nets inference on GPU using OpenGL.
82  */
83  namespace NNets {
84 
85  /**
86  Neural net model.
87  Contains a list of operations and programmatically defined interconnections between them using addConnection().
88  Enables access to the model memory at any point in the model through addOutput() and getOutputData().
89  The memory needed to store internal data during the inference is allocated automatically; storages are reused when possible.
90  The inference of a Model is performed by InferenceTask.
91  */
92  class Model : public GL::ProgramBank {
93  private:
94  /**
95  Connection descriptor.
96  For a given source operation describes a connection with another operation.
97  */
98  typedef struct {
99  AbstractOperation* dest; //!< destination operation
100  int output; //!< output index
101  int input; //!< input index
102  int shuffle; //!< shuffling step (details \ref NNetsShufflingExplained "here")
103  } Connection;
104 
105  /**
106  A user-defined output descriptor.
107  */
108  typedef struct {
109  int index; //!< operation output index to fetch data from
110  std::vector<float> data; //!< container to store the data
111  } UserOutput;
112 
113  std::multimap<const AbstractOperation*, Connection> connections; //!< source operation => connection descriptor mapping
114  std::multimap<const AbstractOperation*, UserOutput> userOutputs; //!< operation => user output mapping
115 
116  std::vector<Storage*> storages; //!< allocated storages used during the inference
117  std::vector<GL::Vector*> vectors; //!< allocated vectors used during the inference
118  std::vector<InternalBitmap*> textures; //!< allocated images used during the inference
119  Profiler* profiler; //!< pointer to a Profiler attached to the model
120 
121  protected:
122  std::vector<AbstractOperation*> ops; //!< model operations
123  ProgressTracking preparingProgress; //!< model preparation progress
124  ProgressTracking inferenceProgress; //!< inference progress
125  bool ready; //!< if `true`, ops are connected to each other and storages are allocated
126 
127  /**
128  Frees all allocated storages.
129  */
130  void freeMemory();
131 
132  /**
133  Allocates a new storage. Its views might be used as operations inputs and outputs.
134  The storage is destroyed together with the model.
135  \param[in,out] gpu A graphic pipeline instance
136  \param[in] size The storage size (width, height, number of channels)
137  \param[in] forGpu Allocate for the use on GPU
138  \param[in] forCpu Allocate for the use on CPU
139  \param[in] pad Storage padding: number of pixels added on both sides along width and height of every channel
140  \param[in] reservedChannels Number of additional channels that may be sampled together with the storage.
141  This does not change the storage size, but impacts the way the channels are packed into the textures.
142  It allows the storage to be sampled with other storages of a specific total depth in the same shader,
143  if the addDepth is greater or equal to the total depth.
144  \return newly allocated storage.
145  */
146  Storage& allocateStorage(GraphicPipeline& gpu, const Size size, bool forGpu = true, bool forCpu = false, const int pad = 0, const int reservedChannels = 0);
147 
148  /**
149  Allocates a new flat storage. Its views are be used as operations inputs and outputs.
150  Flat storages can be inputs of Dense layers.
151  The storage is destroyed together with the model.
152  \param[in,out] gpu A graphic pipeline instance
153  \param[in] size Number of samples in the storage
154  \return newly allocated storage.
155  */
157 
158  /**
159  Allocates a vector that can be used as operation input or output.
160  Differently to flat storages, vectors store floating point data (GL ES 3.1 and higher) or 16-bit signed fixed point values with 8 bits
161  fractional part (GL ES 2.0).
162  \param[in,out] gpu A graphic pipeline instance
163  \param[in] size Number of samples in the vector
164  */
165  GL::Vector& allocateVector(GraphicPipeline& gpu, const int size);
166 
167  /**
168  Allocates a texture that can be used as operation input or output.
169  \param[in,out] gpu A graphic pipeline instance
170  \param[in] size Image size. The depth can be 1, 3 or 4 channels.
171  */
173 
174  /**
175  Checks whether an operation goes before another operation in the model according the ops execution order.
176  \param[in] first The first operation (expected to be executed earlier)
177  \param[in] second The first operation (expected to be executed later)
178  \return `true` if both operations are in the model, and the first one is executed before the second one, `false` otherwise.
179  */
180  bool isPreceding(const AbstractOperation& first, const AbstractOperation& second) const;
181 
182 
183  AbstractOperation* operator[](const std::string& operationName);
184  const AbstractOperation* operator[](const std::string& operationName) const;
185 
186  void addConnection(AbstractOperation& source, AbstractOperation& dest, int output = 0, int input = 0, int shuffle = 0);
187 
188  public:
189  /**
190  Instantiates a model from a list of operations interconnecting them in a feedforward fashion.
191  The first output of every operation is connected to the first input of its successor.
192  Optional connections may be added after model creation.
193  \param[in,out] context A context instance
194  \param[in] ops Operations given in the execution order. The Model does not take ownership of them.
195  */
196  Model(Context& context, std::initializer_list<AbstractOperation*> ops);
197 
198  /**
199  Instantiates an empty model.
200  \param[in,out] context A context instance used to store internal resources needed for inference
201  */
203  ~Model();
204 
205  /**
206  Adds a new operation to the model.
207  The operation is added to the end of the operations list. The execution order corresponds to the addition order.
208  The Model does not takes ownership of the passed pointer.
209  \param[in] newOp The new operation
210  \param[in] connect If `true`, the main operation input (#0) is connected to the main output (#0) of the last operation
211  */
212  void append(AbstractOperation* newOp, bool connect = false);
213 
214  /**
215  Adds new operations to the model.
216  The operations are added to the end of the operations list. The execution order corresponds to the addition order.
217  The Model does not takes ownership of the passed pointer.
218  \param[in] newOps The new operations
219  \param[in] connect If `true`, the main input (#0) of every operation is connected to the main output (#0)
220  of the preceding operation
221  */
222  void append(std::initializer_list<AbstractOperation*> newOps, bool connect = false);
223 
224  /**
225  Adds a new operation to the model before another operation in the execution order.
226  The Model does not takes ownership of the passed pointer. The new operation is not automatically connected to other operations.
227  \param[in] opName Name of the operation the new operation is inserted before
228  \param[in] newOp The new operation
229  */
230  void addOperation(const std::string& opName, AbstractOperation* newOp);
231  void addOperation(const AbstractOperation& operation, AbstractOperation* newOp);
232 
233  /**
234  Adds a connection between two given ops.
235  \param[in] sourceOpName Name of the operation emitting the data
236  \param[in] destOpName Name of the operation receiving the data
237  \param[in] output Output number of the source operation
238  \param[in] input Input number of the destination operation
239  \param[in] shuffle If greater than zero, the storage is shuffled.
240  For shuffle = `n`, the output channels are sent to the destination operation in the following order:
241  0, 1, 2, 3, 4n, 4n+1, 4n+2, 4n+3, 8n, 8n+1, 8n+2, 8n+3, ..., 4, 5, 6, 7, 4n+4, 4n+5, 4n+6, 4n+7, 8n+4, ...
242  \anchor NNetsShufflingExplained
243  */
244  void addConnection(const std::string& sourceOpName, const std::string& destOpName, int output = 0, int input = 0, int shuffle = 0);
245 
246  /**
247  Enables reading output data from the model memory through getOutputData().
248  A given operation output is connected to a storage that might be accessed by the application after the run.
249  \param[in] operation Name of the operation or the operation itself to get data from
250  \param[in] output The operation output index
251  */
252  void addOutput(const std::string& operation, int output = 0);
253  void addOutput(const AbstractOperation& operation, int output = 0);
254 
255  /**
256  Reads data from the model memory.
257  addOutput() is needed to be called first in order to enable reading the data. Otherwise null is returned.
258  \param[out] numSamples Returns number of samples in the pointed data buffer
259  \param[in] operation Name of the operation or the operation itself to get data from
260  \param[in] output The operation output index
261  \return pointer to the data stored as a 3D array of (height, width, channels) layout, or null.
262  */
263  const float* getOutputData(size_t& numSamples, const std::string& operation, int output = 0) const;
264  const float* getOutputData(size_t& numSamples, const AbstractOperation& operation, int output = 0) const;
265 
266  /**
267  Prepares all operations: reads the model data from chunks and builds GPU programs.
268  The inputs of the model needed to be provided.
269  Preparation progress is tracked by a ProgressTracking instance (getPreparingProgress()).
270  \param[in,out] gpu A graphic pipeline instance
271  \param[in] data ChunkCollection containing the model data
272  */
273  virtual void prepare(GraphicPipeline& gpu, ChunkCollection& data);
274 
275  /**
276  \return `true` if the model is ready to be used for inference (prepare() has been called).
277  */
278  inline bool isReady() const { return ready; }
279 
280  /**
281  Runs the inference.
282  \param[in,out] thread Task thread instance
283  \param[in,out] gpu A graphic pipeline
284  */
285  void execute(TaskThread& thread, GraphicPipeline* gpu);
286 
287  /**
288  Checks if a specific operation makes part of the model.
289  \return `true` if the operation is in the model.
290  */
291  bool isOperationInModel(const AbstractOperation& operation) const;
292 
293  inline AbstractOperation& getFirstOperation() { return *ops.front(); }
294  inline AbstractOperation& getLastOperation () { return *ops.back(); }
295  inline const AbstractOperation& getFirstOperation() const { return *ops.front(); }
296  inline const AbstractOperation& getLastOperation () const { return *ops.back(); }
297  inline size_t getNumberOfOperations() const { return ops.size(); }
298 
299  /**
300  Retrieves an operation by its name
301  */
302  template<class OperationClass = AbstractOperation>
303  inline OperationClass& getOperation(const std::string& operationName) {
304  return *static_cast<OperationClass*>((*this)[operationName]);
305  }
306 
307  /**
308  Returns model preparation progress tracking.
309  */
310  inline const ProgressTracking& getPreparingProgress() const { return preparingProgress; }
311 
312  /**
313  Returns inference progress tracking.
314  */
315  inline const ProgressTracking& getInferenceProgress() const { return inferenceProgress; }
316 
317  /**
318  Provides an estimation of the number of multiply-adds characterizing the model complexity.
319  Queries the number of multiply-adds of every operation of the model and sums them up.
320  */
321  unsigned long countMultiplyAdds() const;
322 
323  /**
324  Provides an estimation of the total number of texels fetched by all the operations in the model per image.
325  */
326  unsigned long countTexelFetches() const;
327 
328  /**
329  Returns the amount of texture memory in bytes currently allocated by the model to run the inference.
330  When the model is ready to run, this represents the size of the memory needed to store internal data during the inference.
331  The resulting value does not include the size of GLSL shaders binaries stored in GPU memory which can be significant.
332  */
333  size_t getMemorySize() const;
334 
335  /**
336  Returns serialized representation of the model as a Listing.
337  */
339 
340  /**
341  Returns serialized representation of the model as a string.
342  */
343  std::string serializeToString() const;
344 
345  /**
346  Attaches a profiler instance to meter the execution time per operation during the inference.
347  This may slow down the inference.
348  \param[in] profiler A profiler instance or null pointer (to disable the profiling)
349  */
350  inline void setProfiler(Profiler* profiler) { this->profiler = profiler; }
351  };
352 
353 
354  /**
355  Wrapper for exceptions occuring during the model inference
356  */
357  class InferenceTimeError : public Exception {
358  public:
359  InferenceTimeError(const AbstractOperation& op, const std::exception& ex);
360  };
361  }
362 }
A key-value pair set storing pieces of arbitrary data (chunks) under string keys.
Definition: chunkfile.h:36
Basic class: task and memory management, any kind of static data.
Definition: context.h:59
Base class for all exceptions.
Definition: exception.h:37
Stores linked GLSL programs and their associated fragment shader codes.
Definition: program_bank.h:31
Real-valued vector usable by GPU.
Internal low-level GPU control API.
Definition: pipeline.h:33
Bitmap whose memory is managed by the Beatmup engine.
Parser of simple YAML-like listings.
Definition: listing.h:40
Abstract neural net operation (layer).
Definition: operation.h:46
Wrapper for exceptions occuring during the model inference.
Definition: model.h:357
InferenceTimeError(const AbstractOperation &op, const std::exception &ex)
Definition: model.cpp:587
Neural net model.
Definition: model.h:92
size_t getNumberOfOperations() const
Definition: model.h:297
size_t getMemorySize() const
Returns the amount of texture memory in bytes currently allocated by the model to run the inference.
Definition: model.cpp:524
const AbstractOperation & getFirstOperation() const
Definition: model.h:295
Storage & allocateFlatStorage(GraphicPipeline &gpu, const int size)
Allocates a new flat storage.
Definition: model.cpp:439
bool ready
if true, ops are connected to each other and storages are allocated
Definition: model.h:125
std::vector< AbstractOperation * > ops
model operations
Definition: model.h:122
void freeMemory()
Frees all allocated storages.
Definition: model.cpp:415
std::multimap< const AbstractOperation *, UserOutput > userOutputs
operation => user output mapping
Definition: model.h:114
OperationClass & getOperation(const std::string &operationName)
Retrieves an operation by its name.
Definition: model.h:303
unsigned long countTexelFetches() const
Provides an estimation of the total number of texels fetched by all the operations in the model per i...
Definition: model.cpp:516
Storage & allocateStorage(GraphicPipeline &gpu, const Size size, bool forGpu=true, bool forCpu=false, const int pad=0, const int reservedChannels=0)
Allocates a new storage.
Definition: model.cpp:428
ProgressTracking inferenceProgress
inference progress
Definition: model.h:124
Profiler * profiler
pointer to a Profiler attached to the model
Definition: model.h:119
bool isReady() const
Definition: model.h:278
void addConnection(AbstractOperation &source, AbstractOperation &dest, int output=0, int input=0, int shuffle=0)
Definition: model.cpp:91
Listing serialize() const
Returns serialized representation of the model as a Listing.
ProgressTracking preparingProgress
model preparation progress
Definition: model.h:123
void setProfiler(Profiler *profiler)
Attaches a profiler instance to meter the execution time per operation during the inference.
Definition: model.h:350
const ProgressTracking & getInferenceProgress() const
Returns inference progress tracking.
Definition: model.h:315
void addOutput(const std::string &operation, int output=0)
Enables reading output data from the model memory through getOutputData().
Definition: model.cpp:101
std::vector< Storage * > storages
allocated storages used during the inference
Definition: model.h:116
bool isOperationInModel(const AbstractOperation &operation) const
Checks if a specific operation makes part of the model.
Definition: model.cpp:407
unsigned long countMultiplyAdds() const
Provides an estimation of the number of multiply-adds characterizing the model complexity.
Definition: model.cpp:508
GL::Vector & allocateVector(GraphicPipeline &gpu, const int size)
Allocates a vector that can be used as operation input or output.
Definition: model.cpp:447
Model(Context &context, std::initializer_list< AbstractOperation * > ops)
Instantiates a model from a list of operations interconnecting them in a feedforward fashion.
Definition: model.cpp:27
std::multimap< const AbstractOperation *, Connection > connections
source operation => connection descriptor mapping
Definition: model.h:113
virtual void prepare(GraphicPipeline &gpu, ChunkCollection &data)
Prepares all operations: reads the model data from chunks and builds GPU programs.
Definition: model.cpp:143
std::string serializeToString() const
Returns serialized representation of the model as a string.
Definition: model.cpp:579
AbstractOperation * operator[](const std::string &operationName)
Definition: model.cpp:492
const ProgressTracking & getPreparingProgress() const
Returns model preparation progress tracking.
Definition: model.h:310
void execute(TaskThread &thread, GraphicPipeline *gpu)
Runs the inference.
Definition: model.cpp:339
AbstractOperation & getLastOperation()
Definition: model.h:294
AbstractOperation & getFirstOperation()
Definition: model.h:293
std::vector< InternalBitmap * > textures
allocated images used during the inference
Definition: model.h:118
void addOperation(const std::string &opName, AbstractOperation *newOp)
Adds a new operation to the model before another operation in the execution order.
Definition: model.cpp:68
bool isPreceding(const AbstractOperation &first, const AbstractOperation &second) const
Checks whether an operation goes before another operation in the model according the ops execution or...
Definition: model.cpp:480
std::vector< GL::Vector * > vectors
allocated vectors used during the inference
Definition: model.h:117
const AbstractOperation & getLastOperation() const
Definition: model.h:296
InternalBitmap & allocateTexture(GraphicPipeline &gpu, const Size size)
Allocates a texture that can be used as operation input or output.
Definition: model.cpp:460
void append(AbstractOperation *newOp, bool connect=false)
Adds a new operation to the model.
Definition: model.cpp:47
const float * getOutputData(size_t &numSamples, const std::string &operation, int output=0) const
Reads data from the model memory.
Definition: model.cpp:125
Operation 3D input/output size.
Definition: storage.h:37
3D tensor stored in a set of textures.
Definition: storage.h:126
Collects running time statistics of multiple tracks.
Definition: profiler.h:31
Progress tracking utility.
Thread executing tasks.
Definition: parallelism.h:154
Connection descriptor.
Definition: model.h:98
AbstractOperation * dest
destination operation
Definition: model.h:99
int shuffle
shuffling step (details here)
Definition: model.h:102
A user-defined output descriptor.
Definition: model.h:108
std::vector< float > data
container to store the data
Definition: model.h:110
int index
operation output index to fetch data from
Definition: model.h:109
jlong jobject size
jlong jint op
JNIEnv jlong jobject jstring opName