Beatmup
Beatmup::NNets::Conv2D Class Reference

2D convolution operation computed on GPU. More...

#include <conv2d.h>

Inheritance diagram for Beatmup::NNets::Conv2D:
Beatmup::NNets::AbstractOperation Beatmup::NNets::SpatialFilteringMixin Beatmup::NNets::ActivationFunctionMixin

Public Member Functions

 Conv2D (const std::string &name, const int kernelSize, const int numInputChannels, const int numOutputChannels, const int stride=1, const Size::Padding padding=Size::Padding::VALID, const bool useBias=true, const int numGroups=1, const ActivationFunction activation=ActivationFunction::DEFAULT)
 Instantiates a 2D convolution operation. More...
 
bool isBiasUsed () const
 
int getInputCount () const
 Returns number of operation inputs. More...
 
int getOutputCount () const
 Returns number of operation outputs. More...
 
bool acceptsStorageInput (int index=0) const
 Returns true if the operation can take a Storage::View at a specific input. More...
 
bool acceptsStorageOutput (int index=0) const
 Returns true if the operation can take a Storage::View at a specific output. More...
 
bool acceptsTextureInput (int index=0) const
 Returns true if the operation can take a GL::TextureHandler at a specific input. More...
 
Size getOutputSize (int outputIndex=0) const
 Returns full size of a specific operation output. More...
 
Storage::View getOutput (int index=0)
 Returns a storage view bound to a specific operation output. More...
 
void setInput (Storage::View &&storage, int inputIndex=0)
 
void setInput (GL::TextureHandler &image, int inputIndex=0)
 
void setOutput (Storage::View &&storage, int outputIndex=0)
 
std::map< std::string, std::string > serialize () const
 Returns a serialized representation of th operation;. More...
 
void disconnect ()
 Assigns empty inputs and outputs. More...
 
void setResidualInput (Storage::View &&storage)
 Connects a tensor to a residual input. More...
 
unsigned long countMultiplyAdds () const
 Counts (approximate) number of multiply-adds used by this operation. More...
 
unsigned long countTexelFetches () const
 Counts (approximate) number of texels fetches. More...
 
- Public Member Functions inherited from Beatmup::NNets::AbstractOperation
virtual ~AbstractOperation ()
 
virtual bool usesGpu () const
 Returns true if the operation is run on GPU. More...
 
virtual bool acceptsVectorInput (int index=0) const
 Returns true if the operation can take a GL::Vector at a specific input. More...
 
virtual bool acceptsVectorOutput (int index=0) const
 Returns true if the operation can take a GL::Vector at a specific output. More...
 
virtual bool acceptsTextureOutput (int index=0) const
 Returns true if the operation can take a GL::TextureHandler at a specific output. More...
 
virtual void getOutput (GL::Vector *&vector, int index=0)
 Returns a GL::Vector bound to a specific operation output. More...
 
virtual void getOutput (GL::TextureHandler *&vector, int index=0)
 Returns a GL::TextureHandler bound to a specific operation output. More...
 
virtual void setInput (GL::Vector &vector, int index=0)
 
virtual void setOutput (GL::Vector &vector, int index=0)
 
virtual void setOutput (GL::TextureHandler &image, int index=0)
 
std::string getName () const
 

Static Public Member Functions

static bool initDeserializer ()
 Sets up deserialization of the operation. More...
 

Static Public Attributes

static const char * FILTERS_CHUNK_SUFFIX = "/w"
 suffix added to the op name to get the filters chunk id in the model data More...
 
static const char * BIAS_CHUNK_SUFFIX = "/b"
 suffix added to the op name to get the bias chunk id in the model data More...
 

Private Member Functions

int getIdx (int output, int input, int x, int y) const
 Maps an (inputChannel, outputChannel, x, y) position to a linear coefficient index in the chunkfile. More...
 
void prepare (GraphicPipeline &gpu, ChunkCollection &data, GL::ProgramBank &bank)
 Compiles GLSL shaders. More...
 
void execute (TaskThread &thread, GraphicPipeline &gpu)
 Executes the operation. More...
 
int getInputPadding (int index=0) const
 Retrieves minimum required size of zero padding for a given input. More...
 
void getSampledChannels (int index, int &min, int &max) const
 Retrieves range of input features channels sampled at the same time for a specific input. More...
 

Private Attributes

const Size kernelSize
 
const int numOutputChannels
 number of output feature maps More...
 
const int numGroups
 number of convolution groups More...
 
const int stride
 
const Size::Padding padding
 
const bool useInputImage
 if true, input is the texture handler, not the view More...
 
const bool isDepthwise
 if true, the convolution is depthwise, otherwise regular More...
 
const bool useBias
 if true, the bias addition is enabled More...
 
bool ready
 
Storage::View input
 
Storage::View output
 
Storage::View residualInput
 optional tensor to be added to the output before activation More...
 
GL::TextureHandlerinputImage
 input texture handler to be used instead input view More...
 
std::vector< GL::RenderingProgram * > programs
 pointers to GLSL program, one per quad of output channels More...
 
std::vector< std::array< float, 4 > > coeffs
 model data to pass to uniform variables, if used More...
 
std::vector< int > execOrder
 execution order of GLSL programs More...
 
std::vector< Storage::ViewgroupViews
 views per convolution group More...
 

Additional Inherited Members

- Protected Member Functions inherited from Beatmup::NNets::AbstractOperation
 AbstractOperation (const std::string &name)
 
virtual void execute (TaskThread &thread)
 Executes the operation within a specific CPU thread. More...
 
- Protected Member Functions inherited from Beatmup::NNets::SpatialFilteringMixin
 SpatialFilteringMixin (const int nbSizeX, const int nbSizeY)
 Initializes spatial filtering mixin. More...
 
 ~SpatialFilteringMixin ()
 
void writeHeader (StringBuilder &code, bool useUniformShift)
 Writes out the very GLSL fragment shader header required for spatial neighborhood sampling. More...
 
void declare (StringBuilder &code, const char *datatype, bool inlineSampling=false)
 Declares GLSL fragment shader main(..) code part required for spatial neighborhood sampling. More...
 
void sample (StringBuilder &code, const char *inputName, const int inputIndex, const Point &shift, const bool isFirstSample=true, const char *suffix="")
 Samples a neighborhood of a given texture. More...
 
void sampleInline (StringBuilder &code, const char *inputName, const int inputIndex, const IntPoint &position, const Point &shift, const char *suffix="")
 
void setup (const int width, const int height)
 Prepares the spatial filtering operation execution. More...
 
void setUniformShift (GL::Program &program, const IntPoint &shift, const IntPoint &inputSize)
 Applies an offset to the sampling position at runtime. More...
 
void setupProgram (GL::Program &program)
 Prepares a given program for spatial filtering. More...
 
IntRectangle getSamplingArea (const IntPoint &size, const IntPoint &stride, const Size::Padding padding) const
 Implements common padding policies by computing a rectangular area of positions the sampling kernel takes in order to get the result with the required padding. More...
 
IntRectangle getSamplingArea (const Storage::View &storage, const int channel, const IntPoint &stride, const Size::Padding padding) const
 Computes area in pixels to sample a given storage according to specific stride and padding. More...
 
Rectangle getTextureCoordinates (const Storage::View &storage, const int channel, const IntPoint &stride, const Size::Padding padding, const IntPoint &outputSize) const
 Computes texture coordinates sampling a specific storage channel for given stride, padding and output size. More...
 
std::string getInputSamplingPos () const
 Retrieves input sampling point position for the current fragment. More...
 
bool isUniformShiftUsed () const
 
- Protected Member Functions inherited from Beatmup::NNets::ActivationFunctionMixin
 ActivationFunctionMixin (const ActivationFunction activationFunc)
 
void apply (StringBuilder &code, const char *inputVariable)
 Renders a GLSL code applying activation function to a specific variable and writing the result to gl_FragColor shader output variable. More...
 
- Protected Attributes inherited from Beatmup::NNets::ActivationFunctionMixin
const ActivationFunction activationFunc
 
- Static Protected Attributes inherited from Beatmup::NNets::SpatialFilteringMixin
static const char * SAMPLE_ID_PREFIX = "i"
 prefix of variables declaring a neighbor sample More...
 

Detailed Description

2D convolution operation computed on GPU.

Has 2 inputs: main and residual (detailed below), and a single output. Constraints:

  • Input and output contain values in [0, 1] range sampled over 8 bits.
  • Number of input channels is 3 (i.e., the input is an RGB image) or a multiple of 4.
  • Number of output feature maps is a multiple of 4.
  • For group convolutions, each group contains a multiple of 4 input channels and a multiple of 4 output channels, or exactly 1 input and 1 output channel (i.e., depthwise).
  • Kernels are of square shape.
  • Strides are equal along X and Y.
  • Dilations are equal to 1.
  • If an image is given on input (3 input feature maps), only valid padding is supported.
  • An activation function is always applied on output.

Raspberry Pi-related constraints:

  • Pi cannot sample more than 256 channels to compute a single output value. Actual practical limit is yet lower: something about 128 channels for pointwise convolutions and less than 100 channels for bigger kernels. When the limit is reached, Pi OpenGL driver reports an out of memory error (0x505).

Features:

  • Bias addition integrated.
  • An optional residual input is available: a tensor of output shape added to the convolution result before applying the activation function.

Convolution filters and bias are searched in chunks. The chunk names consist of the operation name followed by Conv2D::FILTERS_CHUNK_SUFFIX and Conv2D::BIAS_CHUNK_SUFFIX respectively. The chunk contents is a single precision floating point arrays. The filter coefficients are taken in "OIHW" layout, i.e., there are 'O*I' contiguous packets of 'H*W' values each. "O" and "I" are output and input channel numbers, "H" and "W" are filter height and width.

Definition at line 62 of file conv2d.h.

Constructor & Destructor Documentation

◆ Conv2D()

Conv2D::Conv2D ( const std::string &  name,
const int  kernelSize,
const int  numInputChannels,
const int  numOutputChannels,
const int  stride = 1,
const Size::Padding  padding = Size::Padding::VALID,
const bool  useBias = true,
const int  numGroups = 1,
const ActivationFunction  activation = ActivationFunction::DEFAULT 
)

Instantiates a 2D convolution operation.

Parameters
[in]nameOperation name
[in]kernelSizeConvolution kernel size
[in]numInputChannelsNumber of input feature map channels (input depth)
[in]numOutputChannelsNumber of output feature map channels (output depth)
[in]strideConvolution stride
[in]paddingPadding policy
[in]useBiasIf true, the bias addition is enabled. The bias vector is searched in the model data.
[in]numGroupsNumber of convolution groups to get a group/depthwise convolution
[in]activationActivation function applied to the operation output

Definition at line 41 of file conv2d.cpp.

51  :
55  useInputImage(numInputChannels == 3),
56  isDepthwise(numInputChannels == numGroups && numOutputChannels == numGroups),
58  ready(false),
59  inputImage(nullptr)
60 {
61  if (useInputImage) {
62  InvalidArgument::check(numGroups == 1, "Cannot apply a group convolution to the input image");
63  InvalidArgument::check(padding == Size::Padding::VALID, "Only valid zero padding setting is supported when an image is used as input");
64  }
65  else
66  Storage::checkChannelNumber(numInputChannels);
68  OutOfRange::checkMin(stride, 1, "Positive convolution stride expected, %d got");
69  OutOfRange::checkMin(kernelSize, 1, "Positive convolution kernel size expected, %d got");
70  if (!useInputImage && !isDepthwise)
71  InvalidArgument::check(this->kernelSize.getDepth() % 4 == 0, "A multiple of 4 is expected as number of input channels in the convolution kernel.");
72  if (!isDepthwise && numGroups > 1)
73  OutOfRange::checkMin(this->kernelSize.getDepth(), 4, "Kernels having less than 4 input channels are not supported in grouped convolutions. Got %d channels.");
74 
75  // check groups alignment: each group must contain 4k inputs and outputs channels
76  if (!isDepthwise) {
77  if (!useInputImage)
78  InvalidArgument::check(numInputChannels % (4 * numGroups) == 0,
79  "Cannot split " +std::to_string(numInputChannels)+ " input channels on " +std::to_string(numGroups)+ " groups of 4*k channels each.");
81  "Cannot split " +std::to_string(numOutputChannels)+ " output channels on " +std::to_string(numGroups)+ " groups of 4*k channels each.");
82  }
83  programs.reserve(numOutputChannels / 4);
84  groupViews.reserve(numGroups);
85 }
static void check(const bool condition, const std::string &message)
Definition: exception.h:75
AbstractOperation(const AbstractOperation &)=delete
disabling copying constructor
ActivationFunctionMixin(const ActivationFunction activationFunc)
Definition: operation.h:419
const bool isDepthwise
if true, the convolution is depthwise, otherwise regular
Definition: conv2d.h:72
const int numGroups
number of convolution groups
Definition: conv2d.h:68
std::vector< GL::RenderingProgram * > programs
pointers to GLSL program, one per quad of output channels
Definition: conv2d.h:79
const bool useBias
if true, the bias addition is enabled
Definition: conv2d.h:73
const Size::Padding padding
Definition: conv2d.h:70
const int stride
Definition: conv2d.h:69
GL::TextureHandler * inputImage
input texture handler to be used instead input view
Definition: conv2d.h:78
const bool useInputImage
if true, input is the texture handler, not the view
Definition: conv2d.h:71
const int numOutputChannels
number of output feature maps
Definition: conv2d.h:67
std::vector< Storage::View > groupViews
views per convolution group
Definition: conv2d.h:82
const Size kernelSize
Definition: conv2d.h:66
int getDepth() const
Definition: storage.h:77
SpatialFilteringMixin(const int nbSizeX, const int nbSizeY)
Initializes spatial filtering mixin.
Definition: operation.cpp:98
static void checkChannelNumber(int channel)
Checks whether a channel number points to the first channel in a texture.
Definition: storage.h:290
static void checkMin(const datatype value, const datatype min, const char *message)
Definition: exception.h:92
std::string to_string(Beatmup::NNets::ActivationFunction function)

Member Function Documentation

◆ getIdx()

int Beatmup::NNets::Conv2D::getIdx ( int  output,
int  input,
int  x,
int  y 
) const
inlineprivate

Maps an (inputChannel, outputChannel, x, y) position to a linear coefficient index in the chunkfile.

Definition at line 87 of file conv2d.h.

87  {
88  return output + numOutputChannels * (input + kernelSize[2] * (x + kernelSize[0] * y));
89  }
Storage::View output
Definition: conv2d.h:76
Storage::View input
Definition: conv2d.h:76
jobject jlong jint jint y
jobject jlong jint x

◆ prepare()

void Conv2D::prepare ( GraphicPipeline gpu,
ChunkCollection data,
GL::ProgramBank bank 
)
privatevirtual

Compiles GLSL shaders.

Parameters
[in,out]gpuA graphic pipeline instance
[in,out]dataChunkfile containing operation data (e.g. weights and biases)
[in,out]bankA program bank with existing GLSL programs to be reused when possible. If a new program is built, it is added to the bank.

Implements Beatmup::NNets::AbstractOperation.

Definition at line 88 of file conv2d.cpp.

88  {
89  RuntimeError::check((useInputImage && inputImage) || (!useInputImage && input), "Input is not provided to Conv2D operation " + getName());
90  RuntimeError::check(output, "Output is not provided to Conv2D operation " + getName());
91 
92  // get coefficients
93  const Chunk kernel(data, getName() + FILTERS_CHUNK_SUFFIX);
94  if (kernel.size() != kernelSize.volume() * numOutputChannels * sizeof(float))
95  throw InconsistentModelData(this, "Weights size mismatch");
96 
97  const Chunk* biases = nullptr;
98  if (useBias) {
99  biases = new Chunk(data, getName() + BIAS_CHUNK_SUFFIX);
100  if (biases->size() != numOutputChannels * sizeof(float))
101  throw InconsistentModelData(this, "Biases size mismatch");
102  }
103 
104  // free old stuff
105  for (auto program : programs)
106  bank.release(gpu, program);
107  programs.clear();
108  coeffs.clear();
109 
110  // decide whether use uniforms or not
111  static const int MAX_ALLOWED_NUMBER_OF_PROGRAMS = 0; // discovered empirically that uniforms are faster on Pi, Nano and desktop
112  static const int NUM_RESERVED_UNFORM_VECTORS = 8 + std::max(kernelSize[0], kernelSize[1]) / 2; // number of uniform vectors to keep unused
113  const int numberOfPrograms = numOutputChannels / 4;
114  const int uniformsLength = kernelSize.volume() + 1; // number of uniform vectors per program
115  const bool useUniforms = !useInputImage && // if an image is given on input, the uniforms use is not unsupported
116  numberOfPrograms > MAX_ALLOWED_NUMBER_OF_PROGRAMS && // if not too many programs, rather go with hardcoded model data
117  uniformsLength + NUM_RESERVED_UNFORM_VECTORS < gpu.getLimit(GraphicPipeline::Limit::FRAGMENT_UNIFORM_VECTORS);
118  if (useUniforms)
119  coeffs.reserve(numberOfPrograms * uniformsLength);
120 
121  const bool useUniformShift = useUniforms && kernelSize.getDepth() <= 4;
122  // use uniform shift if only one input texture is sampled, i.e., depthwise or grouped with groups of 4
123 
124  // init new programs
125  for (int outputChannel = 0; outputChannel < numOutputChannels; outputChannel += 4) {
126  const size_t coefStart = coeffs.size(); // index of the first coefficient in coeffs for the current program
127 
128  // compute indices delimiting the current group
129  const int groupIdx = outputChannel * numGroups / numOutputChannels;
130  const int firstInputChannel = groupIdx * kernelSize.getDepth();
131  const int lastInputChannel = firstInputChannel + (isDepthwise ? 4 : kernelSize.getDepth());
132 
133  // set up GLSL code
135 
136 #ifdef BEATMUP_DEBUG
137  if (!groupViews.empty())
138  DebugAssertion::check(groupViews.back().getNumberOfTextures() <= gpu.getLimit(GraphicPipeline::Limit::TEXTURE_IMAGE_UNITS),
139  "Cannot compute Conv2D operation " + getName() + " on the current GPU: too many textures per group");
140 #endif
141 
142  code.printf("uniform sampler2D %s[%d];", UNIFORM_INPUT, useInputImage || isDepthwise ? 1 : groupViews[groupIdx].getNumberOfTextures());
143  if (residualInput)
144  code.printf("uniform sampler2D %s[1];", UNIFORM_RESIDUAL_INPUT);
145  if (useUniforms)
146  code.printf("uniform highp vec4 %s[%d];", UNIFORM_COEFFICIENT, uniformsLength);
147 
149  code.line("void main() {");
150  code.line("highp vec4 sum;");
151 
152  // declare neighborhood: vec4 for storage, vec3 for image
153  SpatialFilteringMixin::declare(code, useInputImage ? "highp vec3" : "highp vec4", !useInputImage);
154 
155  // loop through input channels
156  for (int inputChannel = firstInputChannel; inputChannel < lastInputChannel; inputChannel += 4) {
157  const int channelInGroup = inputChannel - firstInputChannel;
158 
159  const Point shift = (useUniformShift || !input) ? Point::ZERO :
160  (Point(input.getChannelOrigin(inputChannel) - input.getChannelOrigin(firstInputChannel)) / input.getTextureSize());
161  // texture coordinates sample the first channel in the current group, so shift is relative to its origin
162 
163  // compute depthwise convolution: inline sampling used
164  if (isDepthwise) {
165  code("sum = ");
166  for (int y = 0; y < kernelSize[1]; ++y)
167  for (int x = 0; x < kernelSize[0]; ++x) {
168  if (x > 0 || y > 0) code(" + ");
169  const float* w = kernel.ptr<float>(getIdx(outputChannel, 0, x, y));
170  if (useUniforms) {
171  code.printf("%s[%d] * ", UNIFORM_COEFFICIENT, (int)(coeffs.size() - coefStart));
172  coeffs.emplace_back(std::array<float, 4>{ w[0], w[1], w[2], w[3] });
173  }
174  else
175  code.printf("vec4(" COEF_FMT "," COEF_FMT "," COEF_FMT "," COEF_FMT ") * ", w[0], w[1], w[2], w[3]);
177  }
178  code.line(";");
179  }
180 
181  // compute convolution with 3-channel input image using dot product; no inline sampling
182  else if (useInputImage) {
184  const int offset[4] = { 0, 1 * numOutputChannels, 2 * numOutputChannels, 3 * numOutputChannels };
185  for (int y = 0; y < kernelSize[1]; ++y)
186  for (int x = 0; x < kernelSize[0]; ++x) {
187  code((channelInGroup == 0 && x == 0 && y == 0) ? "sum = vec4(" : "sum += vec4(");
188  for (int c = 0; c < 4; ++c) {
189  if (c > 0) code(",");
190  const float* w = kernel.ptr<float>(getIdx(c + outputChannel, channelInGroup, x, y));
191  code.printf("dot(vec3(" COEF_FMT "," COEF_FMT "," COEF_FMT "), %s%d%d)",
192  w[0], w[offset[1]], w[offset[2]], SpatialFilteringMixin::SAMPLE_ID_PREFIX, x, y);
193  }
194  code.line(");");
195  }
196  }
197 
198  // compute 4m to 4n channels using vector by 4x4 matrix multiply: inline sampling used
199  else {
200  code.printf("sum %s", channelInGroup == 0 ? "=" : "+=");
201  const int offset[4] = { 0, 1 * numOutputChannels, 2 * numOutputChannels, 3 * numOutputChannels };
202  for (int y = 0; y < kernelSize[1]; ++y)
203  for (int x = 0; x < kernelSize[0]; ++x) {
204  if (x > 0 || y > 0) code(" + ");
205  SpatialFilteringMixin::sampleInline(code, UNIFORM_INPUT, groupViews[groupIdx].getChannelTextureNumber(channelInGroup), IntPoint(x, y), shift);
206  code.printf(" * mat4(");
207  for (int c = 0; c < 4; ++c) {
208  if (c > 0) code(",");
209  const float* w = kernel.ptr<float>(getIdx(c + outputChannel, channelInGroup, x, y));
210  if (useUniforms) {
211  code.printf("%s[%d]", UNIFORM_COEFFICIENT, (int)(coeffs.size() - coefStart));
212  coeffs.emplace_back(std::array<float, 4>{ w[0], w[offset[1]], w[offset[2]], w[offset[3]] });
213  }
214  else
215  code.printf(COEF_FMT "," COEF_FMT "," COEF_FMT "," COEF_FMT, w[0], w[offset[1]], w[offset[2]], w[offset[3]]);
216  }
217  code.printf(")");
218  }
219  code.line(";");
220  }
221  }
222 
223  // add residual input
224  if (residualInput) {
225  // get linear mapping of channel pixel positions to sample the residual input properly
226  const IntPoint mainOrigin = input.getChannelOrigin(useUniformShift ? outputChannel : firstInputChannel);
227  const IntPoint residualOrigin = residualInput.getChannelOrigin(outputChannel);
228  const Rectangle mainArea(mainOrigin, mainOrigin + input.getSpatialSize());
229  const Rectangle resArea(residualOrigin, residualOrigin + residualInput.getSpatialSize());
230  const Point mainTexSize(input.getTextureWidth(), input.getTextureHeight());
232  Point scale, offset;
233  (mainArea / mainTexSize).getMapping(resArea / resTexSize, scale, offset);
234  // sample, add to sum
235  code.printf("sum += texture2D(%s[0], %s * vec2(" COORD_FMT "," COORD_FMT ") + vec2(" COORD_FMT "," COORD_FMT "));\n",
236  UNIFORM_RESIDUAL_INPUT, getInputSamplingPos().c_str(), scale.x, scale.y, offset.x, offset.y);
237  }
238 
239  // add bias if enabled
240  if (useBias) {
241  const float* b = biases->ptr<float>(outputChannel);
242  if (useUniforms) {
243  code.printf("sum += %s[%d];", UNIFORM_COEFFICIENT, (int)(coeffs.size() - coefStart));
244  coeffs.emplace_back(std::array<float, 4>{ b[0], b[1], b[2], b[3] });
245  }
246  else
247  code.printf("sum += vec4(" COEF_FMT "," COEF_FMT "," COEF_FMT "," COEF_FMT ");\n", b[0], b[1], b[2], b[3]);
248  }
249 
250  // apply activation
251  ActivationFunctionMixin::apply(code, "sum");
252  code("}");
253 
254  // init program
255  programs.push_back(bank(gpu, code));
256  }
257 
258  // setup execution order: same programs writing to the same texture are next to each other
259  execOrder.resize(programs.size());
260  for (size_t i = 0; i < execOrder.size(); ++i)
261  execOrder[i] = (int)i;
262  std::sort(execOrder.begin(), execOrder.end(), [&](int i, int j) {
263  return programs[i] < programs[j] || (programs[i] == programs[j] &&
264  output.getChannelTextureNumber(4 * i) < output.getChannelTextureNumber(4 * j));
265  });
266 
267  delete biases;
268  ready = true;
269 }
Simply a piece of binary data of a specific size.
Definition: chunkfile.h:210
datatype * ptr(size_t offset=0)
Definition: chunkfile.h:264
size_t size() const
Definition: chunkfile.h:257
static const CustomPoint ZERO
Definition: geometry.h:122
void release(GraphicPipeline &gpu, GL::RenderingProgram *program)
Marks a program as unused any more.
static const char * DECLARE_TEXTURE_COORDINATES_IN_FRAG
Declaring texture coordinates in fragment shader.
int getLimit(Limit limit) const
Definition: pipeline.cpp:936
@ TEXTURE_IMAGE_UNITS
maximum number of texture units per fragment shader
@ FRAGMENT_UNIFORM_VECTORS
maximum number of 4-dimensional uniform vectors per fragment shader
std::string getName() const
Definition: operation.h:242
void apply(StringBuilder &code, const char *inputVariable)
Renders a GLSL code applying activation function to a specific variable and writing the result to gl_...
Definition: operation.cpp:282
std::vector< std::array< float, 4 > > coeffs
model data to pass to uniform variables, if used
Definition: conv2d.h:80
static const char * BIAS_CHUNK_SUFFIX
suffix added to the op name to get the bias chunk id in the model data
Definition: conv2d.h:98
static const char * FILTERS_CHUNK_SUFFIX
suffix added to the op name to get the filters chunk id in the model data
Definition: conv2d.h:97
std::vector< int > execOrder
execution order of GLSL programs
Definition: conv2d.h:81
Storage::View residualInput
optional tensor to be added to the output before activation
Definition: conv2d.h:77
int getIdx(int output, int input, int x, int y) const
Maps an (inputChannel, outputChannel, x, y) position to a linear coefficient index in the chunkfile.
Definition: conv2d.h:87
int volume() const
Definition: storage.h:79
void sample(StringBuilder &code, const char *inputName, const int inputIndex, const Point &shift, const bool isFirstSample=true, const char *suffix="")
Samples a neighborhood of a given texture.
Definition: operation.cpp:150
void sampleInline(StringBuilder &code, const char *inputName, const int inputIndex, const IntPoint &position, const Point &shift, const char *suffix="")
Definition: operation.cpp:174
static const char * SAMPLE_ID_PREFIX
prefix of variables declaring a neighbor sample
Definition: operation.h:285
Point shift
current static shift of the sampling position
Definition: operation.h:275
void writeHeader(StringBuilder &code, bool useUniformShift)
Writes out the very GLSL fragment shader header required for spatial neighborhood sampling.
Definition: operation.cpp:110
std::string getInputSamplingPos() const
Retrieves input sampling point position for the current fragment.
Definition: operation.cpp:276
void declare(StringBuilder &code, const char *datatype, bool inlineSampling=false)
Declares GLSL fragment shader main(..) code part required for spatial neighborhood sampling.
Definition: operation.cpp:119
bool useUniformShift
if true, the sampling position can be shifted dynamically at every run
Definition: operation.h:277
int getTextureHeight() const
Returns height in pixels of all the textures.
Definition: storage.h:375
IntPoint getChannelOrigin(int channel) const
Returns origin in pixels of a given channel within the texture containing it.
Definition: storage.cpp:509
IntPoint getTextureSize() const
Definition: storage.h:377
IntPoint getSpatialSize() const
Returns the spatial size (width and height) of the storage in pixels.
Definition: storage.h:389
int getTextureWidth() const
Returns width in pixels of all the textures.
Definition: storage.h:370
static void check(const bool condition, const std::string &message)
Definition: exception.h:64
StringBuilder including a string container.
#define COEF_FMT
Definition: conv2d.cpp:29
static const char * UNIFORM_COEFFICIENT
Definition: conv2d.cpp:38
#define COORD_FMT
Definition: conv2d.cpp:30
static const char * UNIFORM_RESIDUAL_INPUT
Definition: conv2d.cpp:37
static const char * UNIFORM_INPUT
Definition: conv2d.cpp:36
CustomPoint< float > Point
Definition: geometry.h:626
CustomPoint< int > IntPoint
Definition: geometry.h:629
CustomPoint< numeric > max(const CustomPoint< numeric > &a, const CustomPoint< numeric > &b)
Definition: geometry.h:728
JNIEnv jlong jint jint jint jint jfloat scale
jobject jlong jint jint jint jint jint b
jlong jstring jint jint jint jint w
layer getMapping().position.x

◆ execute()

void Conv2D::execute ( TaskThread thread,
GraphicPipeline gpu 
)
privatevirtual

Executes the operation.

The operation should be prepared.

Parameters
[in,out]threadCalling CPU thread descriptor
[in,out]gpuA graphic pipeline instance

Implements Beatmup::NNets::AbstractOperation.

Definition at line 272 of file conv2d.cpp.

272  {
273  if (!ready)
274  throw NotReady(this);
275 
276  RuntimeError::check((useInputImage && inputImage) || (!useInputImage && input), "Input is not provided to a Conv2D operation.");
277  RuntimeError::check(output, "Output is not provided to Conv2D operation " + getName());
279  throw RuntimeError("Residual input size does not match the output size");
280 
281 #ifdef BEATMUP_DEBUG
282  RuntimeError::check(output.getSize() == getOutputSize(), "Operation output storage size mismatch");
283 #endif
284 
285  // static program setup
289  );
290 
291  // compute tex coords
292  const IntPoint strides(stride, stride);
293  const IntPoint inputTextureSize = useInputImage ?
295  IntPoint(input.getTextureWidth(), input.getTextureHeight());
297  const IntRectangle samplingArea = useInputImage ?
299  getSamplingArea(input, 0, strides, padding);
300 
301  gpu.setTextureCoordinates(samplingArea, inputTextureSize, output.getSpatialSize());
302  }
303 
304  const int coeffsPerProgram = (int)(coeffs.size() / programs.size());
305  const bool uniformsAreUsed = coeffsPerProgram > 0;
306 
307  // for each output channel
308  Storage::Binder bind(gpu);
309  for (size_t i = 0; i < execOrder.size(); ++i) {
310  const int programNum = execOrder[i];
311  const int outputChannel = 4 * programNum;
312 
313  GL::RenderingProgram& program = *programs[programNum];
314 
315  if (isDepthwise) {
316  const int channel = outputChannel;
317 
318  // bind output to a program
319  const bool fast = bind.begin(program, output, outputChannel);
320 
321  if (!fast) {
322  // bind inputs
323  bind(input, UNIFORM_INPUT, outputChannel);
324  if (residualInput)
325  bind(residualInput, UNIFORM_RESIDUAL_INPUT, outputChannel);
327  }
328 
329  // setup the remaining stuff
330  if (isUniformShiftUsed())
332  else
333  gpu.setTextureCoordinates(getSamplingArea(input, channel, strides, padding), inputTextureSize, output.getSpatialSize());
334  }
335 
336  else {
337  // bind output to a program
338  const int groupIdx = outputChannel * numGroups / numOutputChannels;
339  const bool isSameGroup = i > 0 && 4 * execOrder[i - 1] * numGroups / numOutputChannels == groupIdx;
340  const bool fast = bind.begin(program, output, outputChannel) && isSameGroup;
341 
342  const int firstInputChannel = groupIdx * kernelSize.getDepth();
343  const int lastInputChannel = firstInputChannel + (isDepthwise ? 4 : kernelSize.getDepth());
344 
345  if (!fast) {
346  // bind inputs
347  if (useInputImage)
349  else {
350  const int firstInputChannel = groupIdx * kernelSize.getDepth();
351  bind(groupViews[groupIdx], UNIFORM_INPUT);
352 
353  if (residualInput)
354  bind(residualInput, UNIFORM_RESIDUAL_INPUT, outputChannel);
355 
356  if (isUniformShiftUsed())
358  else
359  gpu.setTextureCoordinates(getSamplingArea(input, firstInputChannel, strides, padding), inputTextureSize, output.getSpatialSize());
360  }
361 
362  // setup the remaining stuff
364  }
365  }
366 
367  // update uniforms if needed
368  if (uniformsAreUsed)
369  program.setVec4Array(UNIFORM_COEFFICIENT, coeffs[coeffsPerProgram * programNum].data(), coeffsPerProgram);
370 
371  // g-g-go
372  program.blend();
373  }
374 }
void setVec4Array(const std::string &name, const float *xyzw, const int length)
Definition: program.cpp:466
GLSL program to render images Makes use of default vertex attributes to pass the texture coordinates ...
Definition: program.h:240
void blend(bool onScreen)
Definition: program.cpp:548
virtual const int getHeight() const =0
Height of the texture in pixels.
virtual const int getWidth() const =0
Width of the texture in pixels.
void setTextureCoordinates(const Rectangle &coords)
Specifies texture coordinates for the next rendering pass.
Definition: pipeline.cpp:966
Size getOutputSize(int outputIndex=0) const
Returns full size of a specific operation output.
Definition: conv2d.cpp:397
void setup(const int width, const int height)
Prepares the spatial filtering operation execution.
Definition: operation.cpp:197
void setupProgram(GL::Program &program)
Prepares a given program for spatial filtering.
Definition: operation.cpp:217
IntRectangle getSamplingArea(const IntPoint &size, const IntPoint &stride, const Size::Padding padding) const
Implements common padding policies by computing a rectangular area of positions the sampling kernel t...
Definition: operation.cpp:223
void setUniformShift(GL::Program &program, const IntPoint &shift, const IntPoint &inputSize)
Applies an offset to the sampling position at runtime.
Definition: operation.cpp:209
Binding of different input/output storages/texture handlers to a GLSL program.
Definition: storage.h:419
bitmap bind(jenv, jobj)

◆ getInputPadding()

int Conv2D::getInputPadding ( int  index = 0) const
privatevirtual

Retrieves minimum required size of zero padding for a given input.

Operations that sample a neighborhood of a pixel may need the input to be padded with zeros, if some of the neighboring samples fall out of the are containing data. In Beatmup the zero padding is handled by allocating a bigger input and putting zeros around the area that is actually filled with data.

Returns
number of zero columns and rows to be added to the input area.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 377 of file conv2d.cpp.

377  {
378  return (index == 0 && padding == Size::Padding::SAME) ? std::max(kernelSize[0], kernelSize[1]) / 2 : 0;
379 }
@ SAME
operation output size matches its input size for unit strides
jlong jint index

◆ getSampledChannels()

void Conv2D::getSampledChannels ( int  index,
int &  min,
int &  max 
) const
privatevirtual

Retrieves range of input features channels sampled at the same time for a specific input.

The operation would typically take the entire storage and sample it at once, if needed. If the number of textures in a storage exceeds the number of texture samplers that the GPU may use simultaneously, an exception occurs. This function provides the necessary information to limit the number of textures in the storage when allocating it. When the limit is reached, multiple channels are packed into a single texture in the storage.

Parameters
[in]indexThe input index. Expected to fall in the valid range, i.e. from zero to getInputCount() - 1 inclusive.
[out]minThe minimum number of channels that can be sampled at once
[out]maxThe maximum number of channels that can be sampled at once

Implements Beatmup::NNets::AbstractOperation.

Definition at line 382 of file conv2d.cpp.

382  {
383  if (index == 0) {
384  // main input: sampling an entire group at once
385  min = useInputImage ? 3 : 4;
387  }
388  else if (index == 1) {
389  // residual input: sampling 1 texture at once
390  min = max = 4;
391  }
392  else
393  min = max = 0;
394 }
CustomPoint< numeric > min(const CustomPoint< numeric > &a, const CustomPoint< numeric > &b)
Definition: geometry.h:724

◆ isBiasUsed()

bool Beatmup::NNets::Conv2D::isBiasUsed ( ) const
inline

Definition at line 124 of file conv2d.h.

124 { return useBias; }

◆ getInputCount()

int Beatmup::NNets::Conv2D::getInputCount ( ) const
inlinevirtual

Returns number of operation inputs.

Inputs are then indexed from zero to the returned value minus one inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 126 of file conv2d.h.

126 { return 2; }

◆ getOutputCount()

int Beatmup::NNets::Conv2D::getOutputCount ( ) const
inlinevirtual

Returns number of operation outputs.

Outputs are then indexed from zero to the returned value minus one inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 127 of file conv2d.h.

127 { return 1; }

◆ acceptsStorageInput()

bool Beatmup::NNets::Conv2D::acceptsStorageInput ( int  index = 0) const
inlinevirtual

Returns true if the operation can take a Storage::View at a specific input.

Neural network operations may accept different kinds of data containers on inputs and outputs, namely Storage::View, GL::Vector and textures. This function is used to check whether a given operation accepts a storage view on input.

Parameters
[in]indexThe input index. Expected to fall in the valid range, i.e. from zero to getInputCount() - 1 inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 129 of file conv2d.h.

129 { return (index == 0 && !useInputImage) || index == 1; }

◆ acceptsStorageOutput()

bool Beatmup::NNets::Conv2D::acceptsStorageOutput ( int  index = 0) const
inlinevirtual

Returns true if the operation can take a Storage::View at a specific output.

Neural network operations may accept different kinds of data containers on outputs and outputs, namely Storage::View, GL::Vector and textures. This function is used to check whether a given operation accepts a storage view on output.

Parameters
[in]indexThe output index. Expected to fall in the valid range, i.e. from zero to getOutputCount() - 1 inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 130 of file conv2d.h.

130 { return index == 0; }

◆ acceptsTextureInput()

bool Beatmup::NNets::Conv2D::acceptsTextureInput ( int  index = 0) const
inlinevirtual

Returns true if the operation can take a GL::TextureHandler at a specific input.

Neural network operations may accept different kinds of data containers on inputs and outputs, namely Storage::View, GL::Vector and textures. This function is used to check whether a given operation accepts a texture on input.

Parameters
[in]indexThe input index. Expected to fall in the valid range, i.e. from zero to getInputCount() - 1 inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 131 of file conv2d.h.

131 { return index == 0 && useInputImage; }

◆ getOutputSize()

Size Conv2D::getOutputSize ( int  outputIndex = 0) const
virtual

Returns full size of a specific operation output.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 397 of file conv2d.cpp.

397  {
398  if (outputIndex == 0) {
400  "Input is not provided to Conv2D operation " + getName());
401  const Size inputSize = useInputImage ? Size(inputImage->getWidth(), inputImage->getHeight(), 3) : input.getSize();
402  const Size result = inputSize.transform(
403  kernelSize,
404  Size(stride, stride, 0),
405  padding,
407  );
408  RuntimeError::check(result.volume() > 0, "Invalid (zero or negative) output size got in " + getName());
409  return result;
410  }
411  return Size::EMPTY;
412 }
Operation 3D input/output size.
Definition: storage.h:37
static const Size EMPTY
Definition: storage.h:50
Size transform(Size kernel, Size stride, Padding padding, int depth=0) const
Computes operation output size in function of operation kernel, padding, stride and depth,...
Definition: storage.cpp:58
Beatmup::IntPoint result

◆ getOutput()

Storage::View Beatmup::NNets::Conv2D::getOutput ( int  index = 0)
inlinevirtual

Returns a storage view bound to a specific operation output.

If no view is bound, returns empty view.

Parameters
[in]indexThe output index. Expected to fall in the valid range, i.e. from zero to getOutputCount() - 1 inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 135 of file conv2d.h.

135 { return output; }

◆ setInput() [1/2]

void Conv2D::setInput ( Storage::View &&  storage,
int  inputIndex = 0 
)
virtual

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 483 of file conv2d.cpp.

483  {
484  OutOfRange::check(inputIndex, 0, 1, "Input index out of range: %d");
485  RuntimeError::check(view.getStorage().getPadding() >= getInputPadding(inputIndex), "The storage has insufficient padding");
486  if (inputIndex == 0) {
487  if (view) {
488  RuntimeError::check(!useInputImage, "An image is expected on input, but a tensor is passed");
489  RuntimeError::check(view.getDepth() == kernelSize.getDepth() * numGroups, "Tensor depth does not match kernel depth");
490  // create group views
491  groupViews.clear();
492  if (!isDepthwise)
493  for (int groupIdx = 0; groupIdx < numGroups; ++groupIdx) {
494  const int firstInputChannel = groupIdx * kernelSize.getDepth();
495  const int lastInputChannel = firstInputChannel + (isDepthwise ? 4 : kernelSize.getDepth());
496  groupViews.emplace_back(std::move(view), firstInputChannel, lastInputChannel - firstInputChannel);
497  }
498  }
499  this->input = std::move(view);
500  this->inputImage = nullptr;
501  }
502  else {
503  if (view) {
504  RuntimeError::check(!useInputImage, "Cannot use the residual input when an image is used as the main input");
505  RuntimeError::check(view.getDepth() == numOutputChannels, "Residual input tensor depth does not match output depth");
506  }
507  this->residualInput = std::move(view);
508  }
509 }
int getInputPadding(int index=0) const
Retrieves minimum required size of zero padding for a given input.
Definition: conv2d.cpp:377
static void check(const datatype value, const datatype min, const datatype max, const char *message)
Definition: exception.h:86

◆ setInput() [2/2]

void Conv2D::setInput ( GL::TextureHandler image,
int  inputIndex = 0 
)
virtual

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 518 of file conv2d.cpp.

518  {
519  if (inputIndex == 0) {
520  RuntimeError::check(useInputImage, "Cannot use image as Conv2D input");
521  this->inputImage = &image;
522  }
523  else
524  AbstractOperation::setInput(image, inputIndex);
525 }
virtual void setInput(Storage::View &&storage, int index=0)
Definition: operation.cpp:52

◆ setOutput()

void Conv2D::setOutput ( Storage::View &&  storage,
int  outputIndex = 0 
)
virtual

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 512 of file conv2d.cpp.

512  {
513  OutOfRange::check(outputIndex, 0, 0, "Output index out of range: %d");
514  this->output = std::move(storage);
515 }

◆ serialize()

std::map< std::string, std::string > Conv2D::serialize ( ) const
virtual

Returns a serialized representation of th operation;.

Implements Beatmup::NNets::AbstractOperation.

Definition at line 415 of file conv2d.cpp.

415  {
416  return {
417  { "_name", getName() },
418  { "_type", "conv2d" },
419  { "kernel_size", std::to_string(kernelSize[0]) },
420  { "input_channels", std::to_string(kernelSize.getDepth() * numGroups) },
421  { "output_channels", std::to_string(numOutputChannels) },
422  { "stride", std::to_string(stride) },
423  { "padding", std::to_string(padding) },
424  { "use_bias", useBias ? "true" : "false" },
425  { "groups", std::to_string(numGroups) },
426  { "activation", std::to_string(activationFunc) }
427  };
428 }
const ActivationFunction activationFunc
Definition: operation.h:417

◆ disconnect()

void Conv2D::disconnect ( )
virtual

Assigns empty inputs and outputs.

Implements Beatmup::NNets::AbstractOperation.

Definition at line 474 of file conv2d.cpp.

474  {
475  inputImage = nullptr;
476  input = Storage::View();
478  output = Storage::View();
479  groupViews.clear();
480 }
friend class View
Definition: storage.h:135

◆ setResidualInput()

void Beatmup::NNets::Conv2D::setResidualInput ( Storage::View &&  storage)
inline

Connects a tensor to a residual input.

This input is optional. The tensor is added to the convolution result before the non-linear activation is applied. Its size must match the output size.

Parameters
[in]storageA storage view containing the residual input tensor.

Definition at line 151 of file conv2d.h.

151 { setInput(std::move(storage), 1); }
void setInput(Storage::View &&storage, int inputIndex=0)
Definition: conv2d.cpp:483

◆ countMultiplyAdds()

unsigned long Conv2D::countMultiplyAdds ( ) const
virtual

Counts (approximate) number of multiply-adds used by this operation.

A single multiply-add is one multiplication and one addition.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 528 of file conv2d.cpp.

528  {
529  return getOutputSize(0).volume() * kernelSize.volume();
530 }

◆ countTexelFetches()

unsigned long Conv2D::countTexelFetches ( ) const
virtual

Counts (approximate) number of texels fetches.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 533 of file conv2d.cpp.

533  {
534  unsigned long count = getOutputSize(0).volume() / 4 * kernelSize.volume() / (useInputImage ? 3 : 4);
535  if (residualInput)
536  count += getOutputSize(0).volume() / 4;
537  return count;
538 }
JNIEnv jlong jint jint count

◆ initDeserializer()

static bool Beatmup::NNets::Conv2D::initDeserializer ( )
static

Sets up deserialization of the operation.

Member Data Documentation

◆ kernelSize

const Size Beatmup::NNets::Conv2D::kernelSize
private

Definition at line 66 of file conv2d.h.

◆ numOutputChannels

const int Beatmup::NNets::Conv2D::numOutputChannels
private

number of output feature maps

Definition at line 67 of file conv2d.h.

◆ numGroups

const int Beatmup::NNets::Conv2D::numGroups
private

number of convolution groups

Definition at line 68 of file conv2d.h.

◆ stride

const int Beatmup::NNets::Conv2D::stride
private

Definition at line 69 of file conv2d.h.

◆ padding

const Size::Padding Beatmup::NNets::Conv2D::padding
private

Definition at line 70 of file conv2d.h.

◆ useInputImage

const bool Beatmup::NNets::Conv2D::useInputImage
private

if true, input is the texture handler, not the view

Definition at line 71 of file conv2d.h.

◆ isDepthwise

const bool Beatmup::NNets::Conv2D::isDepthwise
private

if true, the convolution is depthwise, otherwise regular

Definition at line 72 of file conv2d.h.

◆ useBias

const bool Beatmup::NNets::Conv2D::useBias
private

if true, the bias addition is enabled

Definition at line 73 of file conv2d.h.

◆ ready

bool Beatmup::NNets::Conv2D::ready
private

Definition at line 74 of file conv2d.h.

◆ input

Storage::View Beatmup::NNets::Conv2D::input
private

Definition at line 76 of file conv2d.h.

◆ output

Storage::View Beatmup::NNets::Conv2D::output
private

Definition at line 76 of file conv2d.h.

◆ residualInput

Storage::View Beatmup::NNets::Conv2D::residualInput
private

optional tensor to be added to the output before activation

Definition at line 77 of file conv2d.h.

◆ inputImage

GL::TextureHandler* Beatmup::NNets::Conv2D::inputImage
private

input texture handler to be used instead input view

Definition at line 78 of file conv2d.h.

◆ programs

std::vector<GL::RenderingProgram*> Beatmup::NNets::Conv2D::programs
private

pointers to GLSL program, one per quad of output channels

Definition at line 79 of file conv2d.h.

◆ coeffs

std::vector<std::array<float, 4> > Beatmup::NNets::Conv2D::coeffs
private

model data to pass to uniform variables, if used

Definition at line 80 of file conv2d.h.

◆ execOrder

std::vector<int> Beatmup::NNets::Conv2D::execOrder
private

execution order of GLSL programs

Definition at line 81 of file conv2d.h.

◆ groupViews

std::vector<Storage::View> Beatmup::NNets::Conv2D::groupViews
private

views per convolution group

Definition at line 82 of file conv2d.h.

◆ FILTERS_CHUNK_SUFFIX

const char * Conv2D::FILTERS_CHUNK_SUFFIX = "/w"
static

suffix added to the op name to get the filters chunk id in the model data

Definition at line 97 of file conv2d.h.

◆ BIAS_CHUNK_SUFFIX

const char * Conv2D::BIAS_CHUNK_SUFFIX = "/b"
static

suffix added to the op name to get the bias chunk id in the model data

Definition at line 98 of file conv2d.h.


The documentation for this class was generated from the following files: