2D convolution operation computed on GPU. More...

#include <conv2d.h>

Inheritance diagram for Beatmup::NNets::Conv2D:

Public Member Functions
	Conv2D (const std::string &name, const int kernelSize, const int numInputChannels, const int numOutputChannels, const int stride=1, const Size::Padding padding=Size::Padding::VALID, const bool useBias=true, const int numGroups=1, const ActivationFunction activation=ActivationFunction::DEFAULT)
	Instantiates a 2D convolution operation. More...

bool	isBiasUsed () const

int	getInputCount () const
	Returns number of operation inputs. More...

int	getOutputCount () const
	Returns number of operation outputs. More...

bool	acceptsStorageInput (int index=0) const
	Returns `true` if the operation can take a Storage::View at a specific input. More...

bool	acceptsStorageOutput (int index=0) const
	Returns `true` if the operation can take a Storage::View at a specific output. More...

bool	acceptsTextureInput (int index=0) const
	Returns `true` if the operation can take a GL::TextureHandler at a specific input. More...

Size	getOutputSize (int outputIndex=0) const
	Returns full size of a specific operation output. More...

Storage::View	getOutput (int index=0)
	Returns a storage view bound to a specific operation output. More...

void	setInput (Storage::View &&storage, int inputIndex=0)

void	setInput (GL::TextureHandler &image, int inputIndex=0)

void	setOutput (Storage::View &&storage, int outputIndex=0)

std::map< std::string, std::string >	serialize () const
	Returns a serialized representation of th operation;. More...

void	disconnect ()
	Assigns empty inputs and outputs. More...

void	setResidualInput (Storage::View &&storage)
	Connects a tensor to a residual input. More...

unsigned long	countMultiplyAdds () const
	Counts (approximate) number of multiply-adds used by this operation. More...

unsigned long	countTexelFetches () const
	Counts (approximate) number of texels fetches. More...

Public Member Functions inherited from Beatmup::NNets::AbstractOperation
virtual	~AbstractOperation ()

virtual bool	usesGpu () const
	Returns `true` if the operation is run on GPU. More...

virtual bool	acceptsVectorInput (int index=0) const
	Returns `true` if the operation can take a GL::Vector at a specific input. More...

virtual bool	acceptsVectorOutput (int index=0) const
	Returns `true` if the operation can take a GL::Vector at a specific output. More...

virtual bool	acceptsTextureOutput (int index=0) const
	Returns `true` if the operation can take a GL::TextureHandler at a specific output. More...

virtual void	getOutput (GL::Vector *&vector, int index=0)
	Returns a GL::Vector bound to a specific operation output. More...

virtual void	getOutput (GL::TextureHandler *&vector, int index=0)
	Returns a GL::TextureHandler bound to a specific operation output. More...

virtual void	setInput (GL::Vector &vector, int index=0)

virtual void	setOutput (GL::Vector &vector, int index=0)

virtual void	setOutput (GL::TextureHandler &image, int index=0)

std::string	getName () const

Static Public Member Functions
static bool	initDeserializer ()
	Sets up deserialization of the operation. More...

Static Public Attributes
static const char *	FILTERS_CHUNK_SUFFIX = "/w"
	suffix added to the op name to get the filters chunk id in the model data More...

static const char *	BIAS_CHUNK_SUFFIX = "/b"
	suffix added to the op name to get the bias chunk id in the model data More...

Private Member Functions
int	getIdx (int output, int input, int x, int y) const
	Maps an (inputChannel, outputChannel, x, y) position to a linear coefficient index in the chunkfile. More...

void	prepare (GraphicPipeline &gpu, ChunkCollection &data, GL::ProgramBank &bank)
	Compiles GLSL shaders. More...

void	execute (TaskThread &thread, GraphicPipeline &gpu)
	Executes the operation. More...

int	getInputPadding (int index=0) const
	Retrieves minimum required size of zero padding for a given input. More...

void	getSampledChannels (int index, int &min, int &max) const
	Retrieves range of input features channels sampled at the same time for a specific input. More...

Private Attributes
const Size	kernelSize

const int	numOutputChannels
	number of output feature maps More...

const int	numGroups
	number of convolution groups More...

const int	stride

const Size::Padding	padding

const bool	useInputImage
	if `true`, input is the texture handler, not the view More...

const bool	isDepthwise
	if `true`, the convolution is depthwise, otherwise regular More...

const bool	useBias
	if `true`, the bias addition is enabled More...

bool	ready

Storage::View	input

Storage::View	output

Storage::View	residualInput
	optional tensor to be added to the output before activation More...

GL::TextureHandler *	inputImage
	input texture handler to be used instead input view More...

std::vector< GL::RenderingProgram * >	programs
	pointers to GLSL program, one per quad of output channels More...

std::vector< std::array< float, 4 > >	coeffs
	model data to pass to uniform variables, if used More...

std::vector< int >	execOrder
	execution order of GLSL programs More...

std::vector< Storage::View >	groupViews
	views per convolution group More...

Additional Inherited Members
Protected Member Functions inherited from Beatmup::NNets::AbstractOperation
	AbstractOperation (const std::string &name)

virtual void	execute (TaskThread &thread)
	Executes the operation within a specific CPU thread. More...

Protected Member Functions inherited from Beatmup::NNets::SpatialFilteringMixin
	SpatialFilteringMixin (const int nbSizeX, const int nbSizeY)
	Initializes spatial filtering mixin. More...

	~SpatialFilteringMixin ()

void	writeHeader (StringBuilder &code, bool useUniformShift)
	Writes out the very GLSL fragment shader header required for spatial neighborhood sampling. More...

void	declare (StringBuilder &code, const char *datatype, bool inlineSampling=false)
	Declares GLSL fragment shader main(..) code part required for spatial neighborhood sampling. More...

void	sample (StringBuilder &code, const char inputName, const int inputIndex, const Point &shift, const bool isFirstSample=true, const char suffix="")
	Samples a neighborhood of a given texture. More...

void	sampleInline (StringBuilder &code, const char inputName, const int inputIndex, const IntPoint &position, const Point &shift, const char suffix="")

void	setup (const int width, const int height)
	Prepares the spatial filtering operation execution. More...

void	setUniformShift (GL::Program &program, const IntPoint &shift, const IntPoint &inputSize)
	Applies an offset to the sampling position at runtime. More...

void	setupProgram (GL::Program &program)
	Prepares a given program for spatial filtering. More...

IntRectangle	getSamplingArea (const IntPoint &size, const IntPoint &stride, const Size::Padding padding) const
	Implements common padding policies by computing a rectangular area of positions the sampling kernel takes in order to get the result with the required padding. More...

IntRectangle	getSamplingArea (const Storage::View &storage, const int channel, const IntPoint &stride, const Size::Padding padding) const
	Computes area in pixels to sample a given storage according to specific stride and padding. More...

Rectangle	getTextureCoordinates (const Storage::View &storage, const int channel, const IntPoint &stride, const Size::Padding padding, const IntPoint &outputSize) const
	Computes texture coordinates sampling a specific storage channel for given stride, padding and output size. More...

std::string	getInputSamplingPos () const
	Retrieves input sampling point position for the current fragment. More...

bool	isUniformShiftUsed () const

Protected Member Functions inherited from Beatmup::NNets::ActivationFunctionMixin
	ActivationFunctionMixin (const ActivationFunction activationFunc)

void	apply (StringBuilder &code, const char *inputVariable)
	Renders a GLSL code applying activation function to a specific variable and writing the result to gl_FragColor shader output variable. More...

Protected Attributes inherited from Beatmup::NNets::ActivationFunctionMixin
const ActivationFunction	activationFunc

Static Protected Attributes inherited from Beatmup::NNets::SpatialFilteringMixin
static const char *	SAMPLE_ID_PREFIX = "i"
	prefix of variables declaring a neighbor sample More...

Detailed Description

2D convolution operation computed on GPU.

Has 2 inputs: main and residual (detailed below), and a single output. Constraints:

Input and output contain values in [0, 1] range sampled over 8 bits.
Number of input channels is 3 (i.e., the input is an RGB image) or a multiple of 4.
Number of output feature maps is a multiple of 4.
For group convolutions, each group contains a multiple of 4 input channels and a multiple of 4 output channels, or exactly 1 input and 1 output channel (i.e., depthwise).
Kernels are of square shape.
Strides are equal along X and Y.
Dilations are equal to 1.
If an image is given on input (3 input feature maps), only valid padding is supported.
An activation function is always applied on output.

Raspberry Pi-related constraints:

Pi cannot sample more than 256 channels to compute a single output value. Actual practical limit is yet lower: something about 128 channels for pointwise convolutions and less than 100 channels for bigger kernels. When the limit is reached, Pi OpenGL driver reports an out of memory error (0x505).

Features:

Bias addition integrated.
An optional residual input is available: a tensor of output shape added to the convolution result before applying the activation function.

Convolution filters and bias are searched in chunks. The chunk names consist of the operation name followed by Conv2D::FILTERS_CHUNK_SUFFIX and Conv2D::BIAS_CHUNK_SUFFIX respectively. The chunk contents is a single precision floating point arrays. The filter coefficients are taken in "OIHW" layout, i.e., there are 'O*I' contiguous packets of 'H*W' values each. "O" and "I" are output and input channel numbers, "H" and "W" are filter height and width.

Definition at line 62 of file conv2d.h.

Constructor & Destructor Documentation

◆ Conv2D()

Conv2D::Conv2D	(	const std::string &	name,
		const int	kernelSize,
		const int	numInputChannels,
		const int	numOutputChannels,
		const int	stride = `1`,
		const Size::Padding	padding = `Size::Padding::VALID`,
		const bool	useBias = `true`,
		const int	numGroups = `1`,
		const ActivationFunction	activation = `ActivationFunction::DEFAULT`
	)

Instantiates a 2D convolution operation.

Parameters

[in]	name	Operation name
[in]	kernelSize	Convolution kernel size
[in]	numInputChannels	Number of input feature map channels (input depth)
[in]	numOutputChannels	Number of output feature map channels (output depth)
[in]	stride	Convolution stride
[in]	padding	Padding policy
[in]	useBias	If `true`, the bias addition is enabled. The bias vector is searched in the model data.
[in]	numGroups	Number of convolution groups to get a group/depthwise convolution
[in]	activation	Activation function applied to the operation output

Definition at line 41 of file conv2d.cpp.

  :
     AbstractOperation(name), SpatialFilteringMixin(kernelSize, kernelSize), ActivationFunctionMixin(activation),
     kernelSize(kernelSize, kernelSize, numInputChannels / numGroups), numOutputChannels(numOutputChannels), numGroups(numGroups),
     stride(stride), padding(padding),
     useInputImage(numInputChannels == 3),
     isDepthwise(numInputChannels == numGroups && numOutputChannels == numGroups),
     useBias(useBias),
     ready(false),
     inputImage(nullptr)
 {
     if (useInputImage) {
         InvalidArgument::check(numGroups == 1, "Cannot apply a group convolution to the input image");
         InvalidArgument::check(padding == Size::Padding::VALID, "Only valid zero padding setting is supported when an image is used as input");
     }
     else
         Storage::checkChannelNumber(numInputChannels);
     Storage::checkChannelNumber(numOutputChannels);
     OutOfRange::checkMin(stride, 1, "Positive convolution stride expected, %d got");
     OutOfRange::checkMin(kernelSize, 1, "Positive convolution kernel size expected, %d got");
     if (!useInputImage && !isDepthwise)
         InvalidArgument::check(this->kernelSize.getDepth() % 4 == 0, "A multiple of 4 is expected as number of input channels in the convolution kernel.");
     if (!isDepthwise && numGroups > 1)
         OutOfRange::checkMin(this->kernelSize.getDepth(), 4, "Kernels having less than 4 input channels are not supported in grouped convolutions. Got %d channels.");
  
     // check groups alignment: each group must contain 4k inputs and outputs channels
     if (!isDepthwise) {
         if (!useInputImage)
             InvalidArgument::check(numInputChannels % (4 * numGroups) == 0,
                 "Cannot split " +std::to_string(numInputChannels)+ " input channels on " +std::to_string(numGroups)+ " groups of 4*k channels each.");
         InvalidArgument::check(numOutputChannels % (4 * numGroups) == 0,
             "Cannot split " +std::to_string(numOutputChannels)+ " output channels on " +std::to_string(numGroups)+ " groups of 4*k channels each.");
     }
     programs.reserve(numOutputChannels / 4);
     groupViews.reserve(numGroups);
 }

Member Function Documentation

◆ getIdx()

int Beatmup::NNets::Conv2D::getIdx	(	int	output,
		int	input,
		int	x,
		int	y
	)		const

inlineprivate

Maps an (inputChannel, outputChannel, x, y) position to a linear coefficient index in the chunkfile.

Definition at line 87 of file conv2d.h.

                                                                          {
                 return output + numOutputChannels * (input + kernelSize[2] * (x + kernelSize[0] * y));
             }

◆ prepare()

void Conv2D::prepare	(	GraphicPipeline &	gpu,
		ChunkCollection &	data,
		GL::ProgramBank &	bank
	)

privatevirtual

Compiles GLSL shaders.

Parameters

[in,out]	gpu	A graphic pipeline instance
[in,out]	data	Chunkfile containing operation data (e.g. weights and biases)
[in,out]	bank	A program bank with existing GLSL programs to be reused when possible. If a new program is built, it is added to the bank.

Implements Beatmup::NNets::AbstractOperation.

Definition at line 88 of file conv2d.cpp.

                                                                                      {
     RuntimeError::check((useInputImage && inputImage) || (!useInputImage && input), "Input is not provided to Conv2D operation " + getName());
     RuntimeError::check(output, "Output is not provided to Conv2D operation " + getName());
  
     // get coefficients
     const Chunk kernel(data, getName() + FILTERS_CHUNK_SUFFIX);
     if (kernel.size() != kernelSize.volume() * numOutputChannels * sizeof(float))
         throw InconsistentModelData(this, "Weights size mismatch");
  
     const Chunk* biases = nullptr;
     if (useBias) {
         biases = new Chunk(data, getName() + BIAS_CHUNK_SUFFIX);
         if (biases->size() != numOutputChannels * sizeof(float))
             throw InconsistentModelData(this, "Biases size mismatch");
     }
  
     // free old stuff
     for (auto program : programs)
         bank.release(gpu, program);
     programs.clear();
     coeffs.clear();
  
     // decide whether use uniforms or not
     static const int MAX_ALLOWED_NUMBER_OF_PROGRAMS = 0;    // discovered empirically that uniforms are faster on Pi, Nano and desktop
     static const int NUM_RESERVED_UNFORM_VECTORS = 8 + std::max(kernelSize[0], kernelSize[1]) / 2;     // number of uniform vectors to keep unused
     const int numberOfPrograms = numOutputChannels / 4;
     const int uniformsLength = kernelSize.volume() + 1;     // number of uniform vectors per program
     const bool useUniforms = !useInputImage &&                  // if an image is given on input, the uniforms use is not unsupported
         numberOfPrograms > MAX_ALLOWED_NUMBER_OF_PROGRAMS &&    // if not too many programs, rather go with hardcoded model data
         uniformsLength + NUM_RESERVED_UNFORM_VECTORS < gpu.getLimit(GraphicPipeline::Limit::FRAGMENT_UNIFORM_VECTORS);
     if (useUniforms)
         coeffs.reserve(numberOfPrograms * uniformsLength);
  
     const bool useUniformShift = useUniforms && kernelSize.getDepth() <= 4;
         // use uniform shift if only one input texture is sampled, i.e., depthwise or grouped with groups of 4
  
     // init new programs
     for (int outputChannel = 0; outputChannel < numOutputChannels; outputChannel += 4) {
         const size_t coefStart = coeffs.size();     // index of the first coefficient in coeffs for the current program
  
         // compute indices delimiting the current group
         const int groupIdx = outputChannel * numGroups / numOutputChannels;
         const int firstInputChannel = groupIdx * kernelSize.getDepth();
         const int lastInputChannel  = firstInputChannel + (isDepthwise ? 4 : kernelSize.getDepth());
  
         // set up GLSL code
         String code(GL::RenderingPrograms::DECLARE_TEXTURE_COORDINATES_IN_FRAG);
  
 #ifdef BEATMUP_DEBUG
         if (!groupViews.empty())
             DebugAssertion::check(groupViews.back().getNumberOfTextures() <= gpu.getLimit(GraphicPipeline::Limit::TEXTURE_IMAGE_UNITS),
                 "Cannot compute Conv2D operation " + getName() + " on the current GPU: too many textures per group");
 #endif
  
         code.printf("uniform sampler2D %s[%d];", UNIFORM_INPUT, useInputImage || isDepthwise ? 1 : groupViews[groupIdx].getNumberOfTextures());
         if (residualInput)
             code.printf("uniform sampler2D %s[1];", UNIFORM_RESIDUAL_INPUT);
         if (useUniforms)
             code.printf("uniform highp vec4 %s[%d];", UNIFORM_COEFFICIENT, uniformsLength);
  
         SpatialFilteringMixin::writeHeader(code, useUniformShift);
         code.line("void main() {");
         code.line("highp vec4 sum;");
  
         // declare neighborhood: vec4 for storage, vec3 for image
         SpatialFilteringMixin::declare(code, useInputImage ? "highp vec3" : "highp vec4", !useInputImage);
  
         // loop through input channels
         for (int inputChannel = firstInputChannel; inputChannel < lastInputChannel; inputChannel += 4) {
             const int channelInGroup = inputChannel - firstInputChannel;
  
             const Point shift = (useUniformShift || !input) ? Point::ZERO :
                 (Point(input.getChannelOrigin(inputChannel) - input.getChannelOrigin(firstInputChannel)) / input.getTextureSize());
                 // texture coordinates sample the first channel in the current group, so shift is relative to its origin
  
             // compute depthwise convolution: inline sampling used
             if (isDepthwise) {
                 code("sum = ");
                 for (int y = 0; y < kernelSize[1]; ++y)
                 for (int x = 0; x < kernelSize[0]; ++x) {
                     if (x > 0 || y > 0) code(" + ");
                     const float* w = kernel.ptr<float>(getIdx(outputChannel, 0, x, y));
                     if (useUniforms) {
                         code.printf("%s[%d] * ", UNIFORM_COEFFICIENT, (int)(coeffs.size() - coefStart));
                         coeffs.emplace_back(std::array<float, 4>{ w[0], w[1], w[2], w[3] });
                     }
                     else
                         code.printf("vec4(" COEF_FMT "," COEF_FMT "," COEF_FMT "," COEF_FMT ") * ", w[0], w[1], w[2], w[3]);
                     SpatialFilteringMixin::sampleInline(code, UNIFORM_INPUT, 0, IntPoint(x, y), shift);
                 }
                 code.line(";");
             }
  
             // compute convolution with 3-channel input image using dot product; no inline sampling
             else if (useInputImage) {
                 SpatialFilteringMixin::sample(code, UNIFORM_INPUT, 0, Point::ZERO, true, useInputImage ? ".rgb" : "");
                 const int offset[4] = { 0, 1 * numOutputChannels, 2 * numOutputChannels, 3 * numOutputChannels };
                 for (int y = 0; y < kernelSize[1]; ++y)
                 for (int x = 0; x < kernelSize[0]; ++x) {
                     code((channelInGroup == 0 && x == 0 && y == 0) ? "sum = vec4(" : "sum += vec4(");
                     for (int c = 0; c < 4; ++c) {
                         if (c > 0) code(",");
                         const float* w = kernel.ptr<float>(getIdx(c + outputChannel, channelInGroup, x, y));
                         code.printf("dot(vec3(" COEF_FMT "," COEF_FMT "," COEF_FMT "), %s%d%d)",
                                 w[0], w[offset[1]], w[offset[2]], SpatialFilteringMixin::SAMPLE_ID_PREFIX, x, y);
                     }
                     code.line(");");
                 }
             }
  
             // compute 4m to 4n channels using vector by 4x4 matrix multiply: inline sampling used
             else {
                 code.printf("sum %s", channelInGroup == 0 ? "=" : "+=");
                 const int offset[4] = { 0, 1 * numOutputChannels, 2 * numOutputChannels, 3 * numOutputChannels };
                 for (int y = 0; y < kernelSize[1]; ++y)
                 for (int x = 0; x < kernelSize[0]; ++x) {
                     if (x > 0 || y > 0) code(" + ");
                     SpatialFilteringMixin::sampleInline(code, UNIFORM_INPUT, groupViews[groupIdx].getChannelTextureNumber(channelInGroup), IntPoint(x, y), shift);
                     code.printf(" * mat4(");
                     for (int c = 0; c < 4; ++c) {
                         if (c > 0) code(",");
                         const float* w = kernel.ptr<float>(getIdx(c + outputChannel, channelInGroup, x, y));
                         if (useUniforms) {
                             code.printf("%s[%d]", UNIFORM_COEFFICIENT, (int)(coeffs.size() - coefStart));
                             coeffs.emplace_back(std::array<float, 4>{ w[0], w[offset[1]], w[offset[2]], w[offset[3]] });
                         }
                         else
                             code.printf(COEF_FMT "," COEF_FMT "," COEF_FMT "," COEF_FMT, w[0], w[offset[1]], w[offset[2]], w[offset[3]]);
                     }
                     code.printf(")");
                 }
                 code.line(";");
             }
         }
  
         // add residual input
         if (residualInput) {
             // get linear mapping of channel pixel positions to sample the residual input properly
             const IntPoint mainOrigin = input.getChannelOrigin(useUniformShift ? outputChannel : firstInputChannel);
             const IntPoint residualOrigin = residualInput.getChannelOrigin(outputChannel);
             const Rectangle mainArea(mainOrigin, mainOrigin + input.getSpatialSize());
             const Rectangle resArea(residualOrigin, residualOrigin + residualInput.getSpatialSize());
             const Point mainTexSize(input.getTextureWidth(), input.getTextureHeight());
             const Point resTexSize(residualInput.getTextureWidth(), residualInput.getTextureHeight());
             Point scale, offset;
             (mainArea / mainTexSize).getMapping(resArea / resTexSize, scale, offset);
             // sample, add to sum
             code.printf("sum += texture2D(%s[0], %s * vec2(" COORD_FMT "," COORD_FMT ") + vec2(" COORD_FMT "," COORD_FMT "));\n",
                 UNIFORM_RESIDUAL_INPUT, getInputSamplingPos().c_str(), scale.x, scale.y, offset.x, offset.y);
         }
  
         // add bias if enabled
         if (useBias) {
             const float* b = biases->ptr<float>(outputChannel);
             if (useUniforms) {
                 code.printf("sum += %s[%d];", UNIFORM_COEFFICIENT, (int)(coeffs.size() - coefStart));
                 coeffs.emplace_back(std::array<float, 4>{ b[0], b[1], b[2], b[3] });
             }
             else
                 code.printf("sum += vec4(" COEF_FMT "," COEF_FMT "," COEF_FMT "," COEF_FMT ");\n", b[0], b[1], b[2], b[3]);
         }
  
         // apply activation
         ActivationFunctionMixin::apply(code, "sum");
         code("}");
  
         // init program
         programs.push_back(bank(gpu, code));
     }
  
     // setup execution order: same programs writing to the same texture are next to each other
     execOrder.resize(programs.size());
     for (size_t i = 0; i < execOrder.size(); ++i)
         execOrder[i] = (int)i;
     std::sort(execOrder.begin(), execOrder.end(), [&](int i, int j) {
         return programs[i] < programs[j] || (programs[i] == programs[j] &&
             output.getChannelTextureNumber(4 * i) < output.getChannelTextureNumber(4 * j));
     });
  
     delete biases;
     ready = true;
 }

◆ execute()

void Conv2D::execute	(	TaskThread &	thread,
		GraphicPipeline &	gpu
	)

privatevirtual

Executes the operation.

The operation should be prepared.

Parameters

[in,out]	thread	Calling CPU thread descriptor
[in,out]	gpu	A graphic pipeline instance

Implements Beatmup::NNets::AbstractOperation.

Definition at line 272 of file conv2d.cpp.

                                                              {
     if (!ready)
         throw NotReady(this);
  
     RuntimeError::check((useInputImage && inputImage) || (!useInputImage && input), "Input is not provided to a Conv2D operation.");
     RuntimeError::check(output, "Output is not provided to Conv2D operation " + getName());
     if (residualInput && residualInput.getSize() != output.getSize())
         throw RuntimeError("Residual input size does not match the output size");
  
 #ifdef BEATMUP_DEBUG
     RuntimeError::check(output.getSize() == getOutputSize(), "Operation output storage size mismatch");
 #endif
  
     // static program setup
     SpatialFilteringMixin::setup(
         useInputImage ? inputImage->getWidth()  : input.getTextureWidth(),
         useInputImage ? inputImage->getHeight() : input.getTextureHeight()
     );
  
     // compute tex coords
     const IntPoint strides(stride, stride);
     const IntPoint inputTextureSize = useInputImage ?
         IntPoint(inputImage->getWidth(), inputImage->getHeight()) :
         IntPoint(input.getTextureWidth(), input.getTextureHeight());
     if (useInputImage || isUniformShiftUsed()) {
         const IntRectangle samplingArea = useInputImage ?
             getSamplingArea(IntPoint(inputImage->getWidth(), inputImage->getHeight()), strides, padding) :
             getSamplingArea(input, 0, strides, padding);
  
         gpu.setTextureCoordinates(samplingArea, inputTextureSize, output.getSpatialSize());
     }
  
     const int coeffsPerProgram = (int)(coeffs.size() / programs.size());
     const bool uniformsAreUsed = coeffsPerProgram > 0;
  
     // for each output channel
     Storage::Binder bind(gpu);
     for (size_t i = 0; i < execOrder.size(); ++i) {
         const int programNum = execOrder[i];
         const int outputChannel = 4 * programNum;
  
         GL::RenderingProgram& program = *programs[programNum];
  
         if (isDepthwise) {
             const int channel = outputChannel;
  
             // bind output to a program
             const bool fast = bind.begin(program, output, outputChannel);
  
             if (!fast) {
                 // bind inputs
                 bind(input, UNIFORM_INPUT, outputChannel);
                 if (residualInput)
                     bind(residualInput, UNIFORM_RESIDUAL_INPUT, outputChannel);
                 SpatialFilteringMixin::setupProgram(program);
             }
  
             // setup the remaining stuff
             if (isUniformShiftUsed())
                 SpatialFilteringMixin::setUniformShift(program, input.getChannelOrigin(channel) - input.getChannelOrigin(0), input.getTextureSize());
             else
                 gpu.setTextureCoordinates(getSamplingArea(input, channel, strides, padding), inputTextureSize, output.getSpatialSize());
         }
  
         else {
             // bind output to a program
             const int groupIdx = outputChannel * numGroups / numOutputChannels;
             const bool isSameGroup =  i > 0 && 4 * execOrder[i - 1] * numGroups / numOutputChannels == groupIdx;
             const bool fast = bind.begin(program, output, outputChannel) && isSameGroup;
  
             const int firstInputChannel = groupIdx * kernelSize.getDepth();
             const int lastInputChannel  = firstInputChannel + (isDepthwise ? 4 : kernelSize.getDepth());
  
             if (!fast) {
                 // bind inputs
                 if (useInputImage)
                     bind(*inputImage, UNIFORM_INPUT);
                 else {
                     const int firstInputChannel = groupIdx * kernelSize.getDepth();
                     bind(groupViews[groupIdx], UNIFORM_INPUT);
  
                     if (residualInput)
                         bind(residualInput, UNIFORM_RESIDUAL_INPUT, outputChannel);
  
                     if (isUniformShiftUsed())
                         SpatialFilteringMixin::setUniformShift(program, input.getChannelOrigin(firstInputChannel) - input.getChannelOrigin(0), input.getTextureSize());
                     else
                         gpu.setTextureCoordinates(getSamplingArea(input, firstInputChannel, strides, padding), inputTextureSize, output.getSpatialSize());
                 }
  
                 // setup the remaining stuff
                 SpatialFilteringMixin::setupProgram(program);
             }
         }
  
         // update uniforms if needed
         if (uniformsAreUsed)
             program.setVec4Array(UNIFORM_COEFFICIENT, coeffs[coeffsPerProgram * programNum].data(), coeffsPerProgram);
  
         // g-g-go
         program.blend();
     }
 }

◆ getInputPadding()

int Conv2D::getInputPadding ( int index = 0 ) const

privatevirtual

Retrieves minimum required size of zero padding for a given input.

Operations that sample a neighborhood of a pixel may need the input to be padded with zeros, if some of the neighboring samples fall out of the are containing data. In Beatmup the zero padding is handled by allocating a bigger input and putting zeros around the area that is actually filled with data.

Returns: number of zero columns and rows to be added to the input area.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 377 of file conv2d.cpp.

                                            {
     return (index == 0 && padding == Size::Padding::SAME) ? std::max(kernelSize[0], kernelSize[1]) / 2 : 0;
 }

◆ getSampledChannels()

void Conv2D::getSampledChannels	(	int	index,
		int &	min,
		int &	max
	)		const

privatevirtual

Retrieves range of input features channels sampled at the same time for a specific input.

The operation would typically take the entire storage and sample it at once, if needed. If the number of textures in a storage exceeds the number of texture samplers that the GPU may use simultaneously, an exception occurs. This function provides the necessary information to limit the number of textures in the storage when allocating it. When the limit is reached, multiple channels are packed into a single texture in the storage.

Parameters

[in]	index	The input index. Expected to fall in the valid range, i.e. from zero to getInputCount() - 1 inclusive.
[out]	min	The minimum number of channels that can be sampled at once
[out]	max	The maximum number of channels that can be sampled at once

Implements Beatmup::NNets::AbstractOperation.

Definition at line 382 of file conv2d.cpp.

                                                                    {
     if (index == 0) {
         // main input: sampling an entire group at once
         min = useInputImage ? 3 : 4;
         max = useInputImage ? 3 : isDepthwise ? 4 : kernelSize.getDepth();
     }
     else if (index == 1) {
         // residual input: sampling 1 texture at once
         min = max = 4;
     }
     else
         min = max = 0;
 }

◆ isBiasUsed()

bool Beatmup::NNets::Conv2D::isBiasUsed ( ) const

inline

Definition at line 124 of file conv2d.h.

124 { return useBias; }

◆ getInputCount()

int Beatmup::NNets::Conv2D::getInputCount ( ) const

inlinevirtual

Returns number of operation inputs.

Inputs are then indexed from zero to the returned value minus one inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 126 of file conv2d.h.

126 { return 2; }

◆ getOutputCount()

int Beatmup::NNets::Conv2D::getOutputCount ( ) const

inlinevirtual

Returns number of operation outputs.

Outputs are then indexed from zero to the returned value minus one inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 127 of file conv2d.h.

127 { return 1; }

◆ acceptsStorageInput()

bool Beatmup::NNets::Conv2D::acceptsStorageInput ( int index = 0 ) const

inlinevirtual

Returns true if the operation can take a Storage::View at a specific input.

Neural network operations may accept different kinds of data containers on inputs and outputs, namely Storage::View, GL::Vector and textures. This function is used to check whether a given operation accepts a storage view on input.

Parameters

[in] index The input index. Expected to fall in the valid range, i.e. from zero to getInputCount() - 1 inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 129 of file conv2d.h.

129 { return (index == 0 && !useInputImage) || index == 1; }

◆ acceptsStorageOutput()

bool Beatmup::NNets::Conv2D::acceptsStorageOutput ( int index = 0 ) const

inlinevirtual

Returns true if the operation can take a Storage::View at a specific output.

Neural network operations may accept different kinds of data containers on outputs and outputs, namely Storage::View, GL::Vector and textures. This function is used to check whether a given operation accepts a storage view on output.

Parameters

[in] index The output index. Expected to fall in the valid range, i.e. from zero to getOutputCount() - 1 inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 130 of file conv2d.h.

130 { return index == 0; }

◆ acceptsTextureInput()

bool Beatmup::NNets::Conv2D::acceptsTextureInput ( int index = 0 ) const

inlinevirtual

Returns true if the operation can take a GL::TextureHandler at a specific input.

Neural network operations may accept different kinds of data containers on inputs and outputs, namely Storage::View, GL::Vector and textures. This function is used to check whether a given operation accepts a texture on input.

Parameters

[in] index The input index. Expected to fall in the valid range, i.e. from zero to getInputCount() - 1 inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 131 of file conv2d.h.

131 { return index == 0 && useInputImage; }

◆ getOutputSize()

Size Conv2D::getOutputSize ( int outputIndex = 0 ) const

virtual

Returns full size of a specific operation output.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 397 of file conv2d.cpp.

                                                 {
     if (outputIndex == 0) {
         RuntimeError::check((useInputImage && inputImage) || (!useInputImage && input),
             "Input is not provided to Conv2D operation " + getName());
         const Size inputSize = useInputImage ? Size(inputImage->getWidth(), inputImage->getHeight(), 3) : input.getSize();
         const Size result = inputSize.transform(
             kernelSize,
             Size(stride, stride, 0),
             padding,
             numOutputChannels
         );
         RuntimeError::check(result.volume() > 0, "Invalid (zero or negative) output size got in " + getName());
         return result;
     }
     return Size::EMPTY;
 }

◆ getOutput()

Storage::View Beatmup::NNets::Conv2D::getOutput ( int index = 0 )

inlinevirtual

Returns a storage view bound to a specific operation output.

If no view is bound, returns empty view.

Parameters

[in] index The output index. Expected to fall in the valid range, i.e. from zero to getOutputCount() - 1 inclusive.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 135 of file conv2d.h.

135 { return output; }

◆ setInput() [1/2]

void Conv2D::setInput	(	Storage::View &&	storage,
		int	inputIndex = `0`
	)

virtual

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 483 of file conv2d.cpp.

                                                         {
     OutOfRange::check(inputIndex, 0, 1, "Input index out of range: %d");
     RuntimeError::check(view.getStorage().getPadding() >= getInputPadding(inputIndex), "The storage has insufficient padding");
     if (inputIndex == 0) {
         if (view) {
             RuntimeError::check(!useInputImage, "An image is expected on input, but a tensor is passed");
             RuntimeError::check(view.getDepth() == kernelSize.getDepth() * numGroups, "Tensor depth does not match kernel depth");
             // create group views
             groupViews.clear();
             if (!isDepthwise)
                 for (int groupIdx = 0; groupIdx < numGroups; ++groupIdx) {
                     const int firstInputChannel = groupIdx * kernelSize.getDepth();
                     const int lastInputChannel  = firstInputChannel + (isDepthwise ? 4 : kernelSize.getDepth());
                     groupViews.emplace_back(std::move(view), firstInputChannel, lastInputChannel - firstInputChannel);
                 }
         }
         this->input = std::move(view);
         this->inputImage = nullptr;
     }
     else {
         if (view) {
             RuntimeError::check(!useInputImage, "Cannot use the residual input when an image is used as the main input");
             RuntimeError::check(view.getDepth() == numOutputChannels, "Residual input tensor depth does not match output depth");
         }
         this->residualInput = std::move(view);
     }
 }

◆ setInput() [2/2]

void Conv2D::setInput	(	GL::TextureHandler &	image,
		int	inputIndex = `0`
	)

virtual

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 518 of file conv2d.cpp.

                                                              {
     if (inputIndex == 0) {
         RuntimeError::check(useInputImage, "Cannot use image as Conv2D input");
         this->inputImage = &image;
     }
     else
         AbstractOperation::setInput(image, inputIndex);
 }

◆ setOutput()

void Conv2D::setOutput	(	Storage::View &&	storage,
		int	outputIndex = `0`
	)

virtual

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 512 of file conv2d.cpp.

                                                              {
     OutOfRange::check(outputIndex, 0, 0, "Output index out of range: %d");
     this->output = std::move(storage);
 }

◆ serialize()

std::map< std::string, std::string > Conv2D::serialize ( ) const

virtual

Returns a serialized representation of th operation;.

Implements Beatmup::NNets::AbstractOperation.

Definition at line 415 of file conv2d.cpp.

                                                      {
     return {
         { "_name",              getName() },
         { "_type",              "conv2d" },
         { "kernel_size",        std::to_string(kernelSize[0]) },
         { "input_channels",     std::to_string(kernelSize.getDepth() * numGroups) },
         { "output_channels",    std::to_string(numOutputChannels) },
         { "stride",             std::to_string(stride) },
         { "padding",            std::to_string(padding) },
         { "use_bias",           useBias ? "true" : "false" },
         { "groups",             std::to_string(numGroups) },
         { "activation",         std::to_string(activationFunc) }
     };
 }

◆ disconnect()

void Conv2D::disconnect ( )

virtual

Assigns empty inputs and outputs.

Implements Beatmup::NNets::AbstractOperation.

Definition at line 474 of file conv2d.cpp.

                         {
     inputImage = nullptr;
     input = Storage::View();
     residualInput = Storage::View();
     output = Storage::View();
     groupViews.clear();
 }

◆ setResidualInput()

void Beatmup::NNets::Conv2D::setResidualInput ( Storage::View && storage )

inline

Connects a tensor to a residual input.

This input is optional. The tensor is added to the convolution result before the non-linear activation is applied. Its size must match the output size.

Parameters

[in] storage A storage view containing the residual input tensor.

Definition at line 151 of file conv2d.h.

151 { setInput(std::move(storage), 1); }

Beatmup::NNets::Conv2D::setInput

void setInput(Storage::View &&storage, int inputIndex=0)

Definition: conv2d.cpp:483

◆ countMultiplyAdds()

unsigned long Conv2D::countMultiplyAdds ( ) const

virtual

Counts (approximate) number of multiply-adds used by this operation.

A single multiply-add is one multiplication and one addition.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 528 of file conv2d.cpp.

                                               {
     return getOutputSize(0).volume() * kernelSize.volume();
 }

◆ countTexelFetches()

unsigned long Conv2D::countTexelFetches ( ) const

virtual

Counts (approximate) number of texels fetches.

Reimplemented from Beatmup::NNets::AbstractOperation.

Definition at line 533 of file conv2d.cpp.

                                               {
     unsigned long count = getOutputSize(0).volume() / 4 * kernelSize.volume() / (useInputImage ? 3 : 4);
     if (residualInput)
         count += getOutputSize(0).volume() / 4;
     return count;
 }

◆ initDeserializer()

static bool Beatmup::NNets::Conv2D::initDeserializer ( )

static

Sets up deserialization of the operation.

Member Data Documentation

◆ kernelSize

const Size Beatmup::NNets::Conv2D::kernelSize

private

Definition at line 66 of file conv2d.h.

◆ numOutputChannels

const int Beatmup::NNets::Conv2D::numOutputChannels

private

number of output feature maps

Definition at line 67 of file conv2d.h.

◆ numGroups

const int Beatmup::NNets::Conv2D::numGroups

private

number of convolution groups

Definition at line 68 of file conv2d.h.

◆ stride

const int Beatmup::NNets::Conv2D::stride

private

Definition at line 69 of file conv2d.h.

◆ padding

const Size::Padding Beatmup::NNets::Conv2D::padding

private

Definition at line 70 of file conv2d.h.

◆ useInputImage

const bool Beatmup::NNets::Conv2D::useInputImage

private

if true, input is the texture handler, not the view

Definition at line 71 of file conv2d.h.

◆ isDepthwise

const bool Beatmup::NNets::Conv2D::isDepthwise

private

if true, the convolution is depthwise, otherwise regular

Definition at line 72 of file conv2d.h.

◆ useBias

const bool Beatmup::NNets::Conv2D::useBias

private

if true, the bias addition is enabled

Definition at line 73 of file conv2d.h.

◆ ready

bool Beatmup::NNets::Conv2D::ready

private

Definition at line 74 of file conv2d.h.

◆ input

Storage::View Beatmup::NNets::Conv2D::input

private

Definition at line 76 of file conv2d.h.

◆ output

Storage::View Beatmup::NNets::Conv2D::output

private

Definition at line 76 of file conv2d.h.

◆ residualInput

Storage::View Beatmup::NNets::Conv2D::residualInput

private

optional tensor to be added to the output before activation

Definition at line 77 of file conv2d.h.

◆ inputImage

GL::TextureHandler* Beatmup::NNets::Conv2D::inputImage

private

input texture handler to be used instead input view

Definition at line 78 of file conv2d.h.

◆ programs

std::vector<GL::RenderingProgram*> Beatmup::NNets::Conv2D::programs

private

pointers to GLSL program, one per quad of output channels

Definition at line 79 of file conv2d.h.

◆ coeffs

std::vector<std::array<float, 4> > Beatmup::NNets::Conv2D::coeffs

private

model data to pass to uniform variables, if used

Definition at line 80 of file conv2d.h.

◆ execOrder

std::vector<int> Beatmup::NNets::Conv2D::execOrder

private

execution order of GLSL programs

Definition at line 81 of file conv2d.h.

◆ groupViews

std::vector<Storage::View> Beatmup::NNets::Conv2D::groupViews

private

views per convolution group

Definition at line 82 of file conv2d.h.

◆ FILTERS_CHUNK_SUFFIX

const char * Conv2D::FILTERS_CHUNK_SUFFIX = "/w"

static

suffix added to the op name to get the filters chunk id in the model data

Definition at line 97 of file conv2d.h.

◆ BIAS_CHUNK_SUFFIX

const char * Conv2D::BIAS_CHUNK_SUFFIX = "/b"

static

suffix added to the op name to get the bias chunk id in the model data

Definition at line 98 of file conv2d.h.

The documentation for this class was generated from the following files:

core/nnets/conv2d.h
core/nnets/conv2d.cpp

Public Member Functions

Static Public Member Functions

Static Public Attributes

Private Member Functions

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ Conv2D()

Member Function Documentation

◆ getIdx()

◆ prepare()

◆ execute()

◆ getInputPadding()

◆ getSampledChannels()

◆ isBiasUsed()

◆ getInputCount()

◆ getOutputCount()

◆ acceptsStorageInput()

◆ acceptsStorageOutput()

◆ acceptsTextureInput()

◆ getOutputSize()

◆ getOutput()

◆ setInput() [1/2]

◆ setInput() [2/2]

◆ setOutput()

◆ serialize()

◆ disconnect()

◆ setResidualInput()

◆ countMultiplyAdds()

◆ countTexelFetches()

◆ initDeserializer()

Member Data Documentation

◆ kernelSize

◆ numOutputChannels

◆ numGroups

◆ stride

◆ padding

◆ useInputImage

◆ isDepthwise

◆ useBias

◆ ready

◆ input

◆ output

◆ residualInput

◆ inputImage

◆ programs

◆ coeffs

◆ execOrder

◆ groupViews

◆ FILTERS_CHUNK_SUFFIX

◆ BIAS_CHUNK_SUFFIX