marakech

Playing and recording video is one of the most important features of an Android device. Video now became one of the main ways for spreading information and many times it is the preferred way of presenting educational information before text.

Android offers various APIs that enable video playback and processing. Even though the ffmpeg library seems to be a popular choice in video apps, using only the native Android API can offer some additional benefits. The codecs on devices can be optimized and hardware accelerated, and you should also be able to avoid possible patent issues related to some codecs. Additionally, the GPL/LGPL license might not suit your particular project.

In one of my free-time projects I'm controlling my DSLR Camera from an Android app via USB OTG cable and I use the app to schedule shots automatically throughout the day. The shots can then be used to create a time-lapse. Normally I would use desktop software to create the time-lapse, but I thought that this could be an interesting thing to do directly on an Android device.

So this post will be about creating a video from a series of images.

I'll only focus on the video part here. Adding audio to the video will be part of my next post.

You can find the sample project related to this post on github.

High-Level Overview

Of course, creating a time-lapse from a series of images can be accomplished in various ways on Android. I decided to go with MediaCodec API and particularly use OpenGL ES2 together with a Surface to provide MediaCodec with data that has the right format to be processed. I hope the reasons and advantages of this decision will be understandable when I show a bit more details in the consecutive sections.

First I'll configure MediaCodec to ensure it can encode into the right output format.

Then I'll use EGL to prepare a surface and context for rendering with OpenGL ES2.

Once I have the encoder and OpenGL set up, I can start to encode the input images. Android's API already offers many functions for decoding and processing bitmaps which I can use to pass the input images into OpenGL as a texture.

With OpenGL you then have the additional benefit that you can perform various transformations and effects efficiently, but I won't be using much of it in this example.

Rendering the image with OpenGL into a Surface will make the input data available to MediaCodec's encoder.

After encoding I can then pass the encoded frames into MediaMuxer. MediaMuxer will allow me to properly embed the encoded frames inside of a container format (e.g. MP4 or Webm) and possibly mix in some audio or other video tracks.

The official docs usually only contained smaller snippets which didn't show all the tasks that I needed to accomplish. Therefore I had to look all around the internet for more information. One good source that helped me to understand how this is working can be found here.

Configuring MediaCodec Video Encoder

MediaCodec API contains classes and methods responsible for decoding and encoding of video and audio. It operates on output and input buffers. For a video encoder, the input buffer will contain the raw video frames that you provide (in this case via a Surface). After processing, the encoder's output buffer should contain your compressed output data.

MediaCodec API provides factory methods that allow you to create an instance of MediaCodec. You can create an encoder with the following code

const val MIME = "video/avc"
val encoder = MediaCodec.createEncoderByType(MIME)

Once you have an instance of MediaCodec encoder, it still needs to be configured.

MediaCodec supports only specific encoding resolutions per codec. You can try to find something similar to the resolution and aspect ratio of your input images first or do some scaling and cropping later when rendering with OpenGL.

Android provides the MediaCodecInfo.CodecCapabilities class that allows you to query information about the supported properties per codec. You can specifically use VideoCapabilities to get some information about the video codecs. For example, to find the smallest and largest supported width/height, you can do the following

const val MIME = "video/avc"
val encoder = MediaCodec.createEncoderByType(MIME)
val heightsRange = encoder.codecInfo.getCapabilitiesForType(MIME).videoCapabilities.supportedHeights
val widthsRange = encoder.codecInfo.getCapabilitiesForType(MIME).videoCapabilities.supportedWidths

Then you can also check if the combination of width and height is actually supported

if (encoder.codecInfo.getCapabilitiesForType(MIME).videoCapabilities.isSizeSupported(w, h))
    // Great, this resolution is supported ..

The official docs also mention some recommended resolutions for H.264 and VP8 media codec. These resolutions are also mentioned in the compatibility documents (with a "MUST" support), so you have some guarantees that at least these resolutions will be supported by all devices.

Besides picking a supported resolution, there are some other required parameters for configuring the media codec

val format = MediaFormat.createVideoFormat(MIME, width, height)
format.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
format.setInteger(MediaFormat.KEY_BIT_RATE, 2000000)
format.setInteger(MediaFormat.KEY_FRAME_RATE, 30)
format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 15)

encoder.configure(format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)

I the code above I used the createVideoFormat() helper method to create a minimal format with some prefilled values. I also set some other parameters which should be mostly self explanatory.

MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface is necessary in my case because I would like to provide input via a Surface.

Also note that I specified KEY_FRAME_RATE because this value is required when configuring encoders. You would probably think that the final video file will contain a video track encoded with the frame rate you specified here. But this is not the case. This value is actually ignored by MediaMuxer which assembles the final video file. There is of course another way to control the final frame rate which I'll show in later sections.

Calling configure() with the format and flag MediaCodec.CONFIGURE_FLAG_ENCODE will bring the encoder into the configured state. If you didn't get an exception after this, you probably specified the format correctly.

Initializing Input Surface for MediaCodec With EGL

I decided to use a Surface for providing input to the encoder. Using a Surface instead of a ByteBuffer for video data is recommended by the official docs because it should improve performance.

Additionally, when using a ByteBuffer, the input has to be first converted to a supported color format that is device-specific. From my understanding, this conversion should be performed automatically when using a Surface.

And one additional benefit of using a Surface instead of ByteBuffer is that you can do the rendering with OpenGL. This allows you to add effects and perform various transformations efficiently on GPU.

MediaCodec provides a method to create a Surface for you - createInputSurface().

The Surface you get needs some additional configuration before you can use it with OpenGL ES2. This configuration is done with EGL

val surface = encoder.createInputSurface()

val eglDisplay = EGL14.eglGetDisplay(EGL14.EGL_DEFAULT_DISPLAY)
if (eglDisplay == EGL14.EGL_NO_DISPLAY)
    throw RuntimeException("eglDisplay == EGL14.EGL_NO_DISPLAY: "
            + GLUtils.getEGLErrorString(EGL14.eglGetError()))

val version = IntArray(2)
if (!EGL14.eglInitialize(eglDisplay, version, 0, version, 1))
    throw RuntimeException("eglInitialize(): " + GLUtils.getEGLErrorString(EGL14.eglGetError()))

val attribList = intArrayOf(
    EGL14.EGL_RED_SIZE, 8,
    EGL14.EGL_GREEN_SIZE, 8,
    EGL14.EGL_BLUE_SIZE, 8,
    EGL14.EGL_ALPHA_SIZE, 8,
    EGL14.EGL_RENDERABLE_TYPE, EGL14.EGL_OPENGL_ES2_BIT,
    EGLExt.EGL_RECORDABLE_ANDROID, 1,
    EGL14.EGL_NONE
)
val configs = arrayOfNulls<EGLConfig>(1)
val nConfigs = IntArray(1)
EGL14.eglChooseConfig(eglDisplay, attribList, 0, configs, 0, configs.size, nConfigs, 0)

var err = EGL14.eglGetError()
if (err != EGL14.EGL_SUCCESS)
    throw RuntimeException(GLUtils.getEGLErrorString(err))

val ctxAttribs = intArrayOf(
    EGL14.EGL_CONTEXT_CLIENT_VERSION, 2,
    EGL14.EGL_NONE
)
val eglContext = EGL14.eglCreateContext(eglDisplay, configs[0], EGL14.EGL_NO_CONTEXT, ctxAttribs, 0)

err = EGL14.eglGetError()
if (err != EGL14.EGL_SUCCESS)
    throw RuntimeException(GLUtils.getEGLErrorString(err))

val surfaceAttribs = intArrayOf(
    EGL14.EGL_NONE
)
val eglSurface = EGL14.eglCreateWindowSurface(eglDisplay, configs[0], surface, surfaceAttribs, 0)
err = EGL14.eglGetError()
if (err != EGL14.EGL_SUCCESS)
    throw RuntimeException(GLUtils.getEGLErrorString(err))

if (!EGL14.eglMakeCurrent(eglDisplay, eglSurface, eglSurface, eglContext))
    throw RuntimeException("eglMakeCurrent(): " + GLUtils.getEGLErrorString(EGL14.eglGetError()))

The previous code is a bit longer, but I'll try to break it up and explain what it does in relation to OpenGL and MediaCodec.

Android uses a BufferQueue to pass graphical data around. Some parts of the system operate as a consumer (SurfaceFlinger) and some as producer (e.g. OpenGL) of these buffers of graphical data.

Normally you don't access the graphics buffer directly from your app. You can use a Surface for this. Surface is a handle created by consumer of buffers which the producer of the buffers can use to push graphic data to consumer.

In our case the Surface seems to be created by the mediaserver process (consumer) and used by OpenGL ES2 (producer) to feed it with rendered frames.

OpenGL doesn't use the Surface directly, but through an EGLSurface.

In the code above, before I created an EGLSurface, I first initialized a context that is necessary for working with OpenGL.

Most of the stuff is similar to how you would initialize OpenGL rendering on a regular screen. However, there's one additional attribute used - EGL_RECORDABLE_ANDROID. I didn't find much information about it in docs, but it seems that it's necessary for creating a surface with underlying buffer format that MediaCodec can understand.

I created an EGLSurface by calling eglCreateWindowSurface() and giving it our Surface as one of the attributes. This connected the Surface to the producer side of buffer queue - OpenGL.

OpenGL always operates on a context that is made "current" and the context holds data in thread-local storage. This means it's thread-specific. Therefore you always need to be aware from which thread you're calling the OpenGL functions.

I recommend to check out the explanation in official docs to understand better how this works.

The EGLSurface and context should be cleaned up after its no longer being used

if (eglDisplay != EGL14.EGL_NO_DISPLAY) {
    EGL14.eglDestroySurface(eglDisplay, eglSurface)
    EGL14.eglDestroyContext(eglDisplay, eglContext)
    EGL14.eglReleaseThread()
    EGL14.eglTerminate(eglDisplay);
}
surface?.release();
eglDisplay = EGL14.EGL_NO_DISPLAY
eglContext = EGL14.EGL_NO_CONTEXT
eglSurface = EGL14.EGL_NO_SURFACE

Rendering Bitmap Into Texture With OpenGL

I already showed how to use OpenGL for rendering into a texture in my previous posts, so you can find additional information there. In this section I'll just quickly summarize the main steps.

Remember that you should only call the following OpenGL methods after you initialized the EGLContext and made it current and only from the same thread

private val vertexShaderCode =
    "precision highp float;\n" +
    "attribute vec3 vertexPosition;\n" +
    "attribute vec2 uvs;\n" +
    "varying vec2 varUvs;\n" +
    "uniform mat4 mvp;\n" +
    "\n" +
    "void main()\n" +
    "{\n" +
    "\tvarUvs = uvs;\n" +
    "\tgl_Position = mvp * vec4(vertexPosition, 1.0);\n" +
    "}"

private val fragmentShaderCode =
    "precision mediump float;\n" +
    "\n" +
    "varying vec2 varUvs;\n" +
    "uniform sampler2D texSampler;\n" +
    "\n" +
    "void main()\n" +
    "{\t\n" +
    "\tgl_FragColor = texture2D(texSampler, varUvs);\n" +
    "}"

private var vertices = floatArrayOf(
    // x, y, z, u, v
    -1.0f, -1.0f, 0.0f, 0f, 0f,
    -1.0f, 1.0f, 0.0f, 0f, 1f,
    1.0f, 1.0f, 0.0f, 1f, 1f,
    1.0f, -1.0f, 0.0f, 1f, 0f
)

private var indices = intArrayOf(
    2, 1, 0, 0, 3, 2
)

private var program: Int = 0
private var vertexHandle: Int = 0
private var bufferHandles = IntArray(2)
private var uvsHandle: Int = 0
private var mvpHandle: Int = 0
private var samplerHandle: Int = 0
private val textureHandle = IntArray(1)

var vertexBuffer: FloatBuffer = ByteBuffer.allocateDirect(vertices.size * 4).run {
    order(ByteOrder.nativeOrder())
    asFloatBuffer().apply {
        put(vertices)
        position(0)
    }
}

var indexBuffer: IntBuffer = ByteBuffer.allocateDirect(indices.size * 4).run {
    order(ByteOrder.nativeOrder())
    asIntBuffer().apply {
        put(indices)
        position(0)
    }
}

...

fun initGl() {
    val vertexShader = GLES20.glCreateShader(GLES20.GL_VERTEX_SHADER).also { shader ->
        GLES20.glShaderSource(shader, vertexShaderCode)
        GLES20.glCompileShader(shader)
    }

    val fragmentShader = GLES20.glCreateShader(GLES20.GL_FRAGMENT_SHADER).also { shader ->
        GLES20.glShaderSource(shader, fragmentShaderCode)
        GLES20.glCompileShader(shader)
    }

    program = GLES20.glCreateProgram().also {
        GLES20.glAttachShader(it, vertexShader)
        GLES20.glAttachShader(it, fragmentShader)
        GLES20.glLinkProgram(it)

        vertexHandle = GLES20.glGetAttribLocation(it, "vertexPosition")
        uvsHandle = GLES20.glGetAttribLocation(it, "uvs")
        mvpHandle = GLES20.glGetUniformLocation(it, "mvp")
        samplerHandle = GLES20.glGetUniformLocation(it, "texSampler")
    }

    // Initialize buffers
    GLES20.glGenBuffers(2, bufferHandles, 0)

    GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, bufferHandles[0])
    GLES20.glBufferData(GLES20.GL_ARRAY_BUFFER, vertices.size * 4, vertexBuffer, GLES20.GL_DYNAMIC_DRAW)

    GLES20.glBindBuffer(GLES20.GL_ELEMENT_ARRAY_BUFFER, bufferHandles[1])
    GLES20.glBufferData(GLES20.GL_ELEMENT_ARRAY_BUFFER, indices.size * 4, indexBuffer, GLES20.GL_DYNAMIC_DRAW)

    // Init texture handle
    GLES20.glGenTextures(1, textureHandle, 0)

    // Ensure I can draw transparent stuff that overlaps properly
    GLES20.glEnable(GLES20.GL_BLEND)
    GLES20.glBlendFunc(GLES20.GL_SRC_ALPHA, GLES20.GL_ONE_MINUS_SRC_ALPHA)
}

In the code above, I initialize all the OpenGL-related stuff inside of the initGl() method.

The vertex and fragment shaders are very simple. The main purpose is to draw a texture onto a quad. For that I'm using vertex and index buffers.

The actual drawing code is the following

// Load bitmap from file
val bitmap = BitmapFactory.decodeFile(imagePath)

// Prepare some transformations
val mvp = FloatArray(16)
Matrix.setIdentityM(mvp, 0)
Matrix.scaleM(mvp, 0, 1f, -1f, 1f)

GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT or GLES20.GL_DEPTH_BUFFER_BIT)
GLES20.glClearColor(0f, 0f, 0f, 0f)

GLES20.glViewport(0, 0, viewportWidth, viewportHeight)

GLES20.glUseProgram(program)

// Pass transformations to shader
GLES20.glUniformMatrix4fv(mvpHandle, 1, false, mvp, 0)

// Prepare texture for drawing
GLES20.glActiveTexture(GLES20.GL_TEXTURE0)
GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, textureHandle[0])
GLES20.glPixelStorei(GLES20.GL_UNPACK_ALIGNMENT, 1)

// Pass the Bitmap to OpenGL here
GLUtils.texImage2D(GLES20.GL_TEXTURE_2D, 0, bitmap, 0)

GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST)
GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_NEAREST)

// Prepare buffers with vertices and indices & draw
GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, bufferHandles[0])
GLES20.glBindBuffer(GLES20.GL_ELEMENT_ARRAY_BUFFER, bufferHandles[1])

GLES20.glEnableVertexAttribArray(vertexHandle)
GLES20.glVertexAttribPointer(vertexHandle, 3, GLES20.GL_FLOAT, false, 4 * 5, 0)

GLES20.glEnableVertexAttribArray(uvsHandle)
GLES20.glVertexAttribPointer(uvsHandle, 2, GLES20.GL_FLOAT, false, 4 * 5, 3 * 4)

GLES20.glDrawElements(GLES20.GL_TRIANGLES, 6, GLES20.GL_UNSIGNED_INT, 0)

The texture that we draw will be loaded from our images. I only used BitmapFactory.decodeFile() without specifying additional BitmapFactory.Options. You might want to tweak this depending on your input image files.

I used GLUtils.texImage2D() to pass the Bitmap to OpenGL. This method determines the texture format automatically. You also might want to check this post which explains some possible issues that might arise when you use a Bitmap as texture in OpenGL on Android.

The basic vertex shader code applies some basic transformations to the vertices of the quad on which I display the texture. I only used a scale matrix for flipping the texture upside-down (maybe I should flip the UVs instead). If you want to see how to apply some other transformations, like setting camera positions or applying translations, check out the sample from my other blogpost.

The drawing code can be called after you pulled encoded frames from encoder and before you call EGL14.eglSwapBuffers() (which should feed encoder with new input buffer). I'll show exactly where I used it in the next section.

Encoding & Muxing Frames

MediaCodec supports synchronous and asynchronous processing modes. In this section I'll show how to use the synchronous mode.

As mentioned in previous section, after calling configure(), MediaCodec transitions into the configured state.

To start the encoding, we need to transition into excuting state. We do this by calling MediaCodec.start() method.

Once the codec transitions to executing state, it will start to move between 3 sub-states

  1. Flushed
  2. Running
  3. End-of-stream

The running state is basically where most of the encoding happens. The flushed state usually only happens at the beginning or when you manually flushed the buffers (by calling flush()). When you reach your last input image, you transition to end-of-stream state by calling MediaCodec.signalEndOfInputStream().

When you do the encoding and decoding, you usually don't initialize or configure the input and output buffers directly. You ask media codec to give you an input buffer. And you also ask it to give you the output buffer after processing.

One additional part that is important here is MediaMuxer. MediaMuxer allows us to create the properly formatted final video from encoded buffers we get from the MediaCodec encoder.

You can initialize MediaMuxer with the path to output video file and give it your chosen output format (e.g. OutputFormat.MUXER_OUTPUT_MPEG_4). It also has to be started and before each track you want to encode into the final video file, you also need to call addTrack()(In my case I only have one video track without any audio, but you can add audio later). Then you can feed it with encoded samples by calling writeSampleData().

So a simplified algorithm for encoding my images would look something like this

For each image that you want to encode
 - Render as texture with OpenGL
 - Give it to MediaCodec encoder
 - Get encoded data from encoder and give it to MediaMuxer

You give input from OpenGL to encoder by calling eglSwapBuffers() after rendering. You then get output buffer from MediaCodec by calling getOutputBuffer(), which you then release by calling releaseOutputBuffer().

However, there seems to be a small complication when executing the simplified algorithm above. No matter what processing mode you choose (synchronous or asynchronous), MediaCodec processes data internally asynchronously and feeding it with input data also doesn't mean that you get encoded output right away on the other end.

This might cause a problem when you try to provide more input to encoder, but you ran out of empty buffer because it wasn't yet processed. The call to eglSwapBuffers() might block and you never get to the point where you use and release output buffer to make room for new input.

You can solve this issue by first completely draining the encoder, before calling eglSwapBuffers(). Here's a Kotlin version of the drainEncoder() method from EncodeAndMuxTest

fun drainEncoder(encoder: MediaCodec, muxer: MediaMuxer, endOfStream: Boolean) {
    if (endOfStream)
        encoder.signalEndOfInputStream()

    while (true) {
        val outBufferId = encoder.dequeueOutputBuffer(bufferInfo, timeoutUs)

        if (outBufferId >= 0) {
            val encodedBuffer = encoder.getOutputBuffer(outBufferId)

            // MediaMuxer is ignoring KEY_FRAMERATE, so I set it manually here
            // to achieve the desired frame rate
            bufferInfo.presentationTimeUs = presentationTimeUs
            muxer.writeSampleData(trackIndex, encodedBuffer, bufferInfo)

            presentationTimeUs += 1000000/frameRate

            encoder.releaseOutputBuffer(outBufferId, false)

            // Are we finished here?
            if ((bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0)
                break
        } else if (outBufferId == MediaCodec.INFO_TRY_AGAIN_LATER) {
            if (!endOfStream)
                break

            // End of stream, but still no output available. Try again.
        } else if (outBufferId == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
            trackIndex = muxer.addTrack(encoder.outputFormat)
            muxer.start()
        }
    }
}

The method should ensure that encoder is always drained before you supply new input through OpenGL.

Note how I set the presentationTimeUs manually in MediaCodec.BufferInfo. Remember that MediaFormat.KEY_FRAME_RATE from the format that you used to configure the encoder is ignored by the MediaMuxer. Setting it here will allow you to control the framerate of the output video file.

Once you have a loop which allows you to get all output and drain the encoder, you can plug in all the rest that's necessary for creating the final video file

...
// Prepare muxer
muxer = MediaMuxer(outVideoFilePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)

for (filePath in imageFiles) {

    // Get encoded data and feed it to muxer
    drainEncoder(encoder, muxer, false)

    // Render the bitmap/texture with OpenGL here
    render(filePath)

    // Feed encoder with next frame produced by OpenGL
    EGL14.eglSwapBuffers(eglDisplay, eglSurface)
}

// Drain last encoded data and finalize the video file
drainEncoder(encoder, muxer, true)

// Cleanup
// ...

After I initialized the MediaMuxer, I loop over every image and rendered it with OpenGL. Before submitting new input to the encoder (eglSwapBuffers) I first make sure the encoder is drained.

Once all images were fed to MediaCodec, I drain the encoder once again and finalize the video file.

The example above showed synchronous mode of processing with MediaCodec. If I'll some more time I'll try to add an example of asynchronous processing, which might be more efficient.

To take care of the cleanup afterwards, you can do the following

encoder.stop()
encoder.release()

muxer.stop()
muxer.release()

// Release the surface and cleanup EGL stuff as shown in previous section
// ...

Encoding in Background Service

Encoding your video might take longer time. You could set up a background thread from your Activity and perform the encoding there.

However, even a simple rotation of the device would trigger the Activity lifecycle event's and you would need to do additional work here to avoid possible memory leaks.

I think encoding in a background Service makes things a bit easier. In my sample app on github I'm doing the encoding in IntentService

class EncodingService: IntentService("EncodingServiceName") {

    override fun onHandleIntent(p0: Intent?) {
        when (p0?.action) {
            ACTION_ENCODE_IMAGES -> // Encode image here
        }
    }

    companion object {
        const val ACTION_ENCODE_IMAGES = "eu.sisik.vidproc.action.ENCODE_IMAGES"
    }
}

Which can be easily started from your Activity or Fragment, once you've got some input images from the user

...
val intent = Intent(this, EncodingService::class.java)
    .putExtra(EncodingService.KEY_OUT_PATH, getOutputPath())
    .putExtra(EncodingService.KEY_IMAGES, selectedImgUris)

intent.action = EncodingService.ACTION_ENCODE_IMAGES

startService(intent)

Changes in Android Q

Android Q made some some improvements to MediaCodecInfo. The most interesting for me was the possibility to check if codec is actually hardware accelerated isHardwareAccelerated(). However, there's a note in the docs that this attribute is provided by devices manufacturer and it cannot be tested for correctness. Additionally, there's also isSoftwareOnly() which for me sounds a bit like it's doing the same thing as the first method. Will have to check this out more thoroughly...

Final Result

I took a couple of pictures from my balcony. As mentioned at the beginning, I used my old Nikon D3100 DSLR to make the photos. The DSLR was controlled from an Android phone via USB OTG and PTP protocol. I uploaded the photos to my Android phone and generated a video with the procedure I described in this post. The video contains no audio. I'll show how to add audio in my next post.

This is how the result looks like

Next Post Previous Post

Add a comment