tiger
Android's MediaCodec API offers functionality that allows you to decode and encode video and audio efficiently, possibly also with hardware acceleration.

Used together with OpengGL ES, you get access to powerful APIs for manipulating and editing video files.

In this post I would like to show how to convert an existing video file into a grayscale version using only the standard Android APIs without external libraries.

Even though the goal of the post is very specific, you can use the information for a multitude of video editing features. So if you're building some kind of Android video editing app, this tutorial might be something for you.

I already showed how to create a video from multiple input images and also how to add audio to existing video file in my previous posts. This tutorial expands on the information from these previous posts and solves another part in the video processing puzzle. If something in this tutorial won't be completely clear to you, I also recommend that you check out these previous posts.

You can find the sample code related to this post on github. The code was tested on Android 6 Marshmallow (API level 23) and Pie (API level 28), but it should also work on Lollipop (API level 22).

If you you're an Android enthusiast that likes to learn more about Android internals, I highly recommend to check out my Bugjaeger app. It allows you to connect 2 Android devices through USB OTG and perform many of the tasks that are normally only accessible from a developer machine via ADB directly from your Android phone/tablet.

Editing Video Using Android's Standard APIs

To edit a video file and convert it to grayscale, you first need to extract the encoded frames. For this you can use the MediaExtractor class. MediaExtractor will extract the necessary information and encoded chunks from a video file, in the right format and size, so that it can be directly fed into a MediaCodec decoder.

MediaCodec decoder will make the decoded frames available to OpengGL ES as a texture. You can achieve this with the help of a SurfaceTexture, which can then be used to create a Surface for the decoder.

Once you have the encoded frames available as OpenGL ES textures, you can perform various efficient transformations and effects in OpenGL shaders. In this case I just convert the RGBA pixels into a grayscale value using a standard formula.

Next you need to feed the the processed frames into a MediaCodec encoder. The encoder allows you to create an input Surface which can receive the frames rendered by OpenGL. This Surface will require some additional configuration before it can be used. You will need to use EGL to set up the necessary thread-specific OpenGL context for this manually (as opposed to what you might've been used to when rendering with OpenGL ES by using Android's helper classes, like, GLSurfaceView). Even though I listed the encoding part after decoding, I'll configure encoder's input Surface before decoder's output Surface. This is because I want to make sure, that the SurfaceTexture will be created on the same thread that already has an OpenGL context set up.

The processed and encoded frames can then be put into the final output file. Android offers the MediaMuxer class for this. This class takes care of creating the container file format and embedding all necessary meta information into it and it also knows how to handle the chunks outputted from the encoder. In my example I chose MPEG4 with H.264 encoding. You might choose a different format, but you need to make sure that it's supported.

Extracting Encoded Frames With MediaExtractor

MediaExtractor allows you to get the encoded frames from your input video files. You'll get the frames in chunks that can be directly fed to the decoder.

I showed how to use the MediaExtractor in my previous post. Here I'll just quickly summarize the steps.

Your video file will likely contain multiple tracks (at least one video and one audio track). Here I'm only interested in the video track. To select the video track I can do the following

val extractor = MediaExtractor()
extractor.setDataSource(inFilePath)

for (i in 0 until extractor.trackCount) {
    val format = extractor.getTrackFormat(i)
    val mime = format.getString(MediaFormat.KEY_MIME);

    if (mime.startsWith("video/")) {

        extractor.selectTrack(i)
        // Read the frames from this track here
        // ...
    }
}

The format variable will contain also other important parameters of your input video, like, resolution, frame rate, or bitrate. You can use the values here to also configure the output format for the encoder. However, note that you might first need to check if the same parameters are supported for encoding (e.g. the resolution), or you'll have to stick with parameters that should guaranteed to be supported according to the compatibility documents. You can look at my previous post where I programmatically check if resolution is supported.

Once you selected the right track, you can extract video chunks using the following code

val maxChunkSize = 1024 * 1024
val buffer = ByteBuffer.allocate(maxChunkSize)
val bufferInfo = MediaCodec.BufferInfo()

// Extract all frames from selected track
while (true) {
    val chunkSize = videoExtractor.readSampleData(buffer, 0)

    if (chunkSize > 0) {
        // Process extracted frame here
        // ...

        videoExtractor.advance()

    } else {
    // All frames extracted - we're done
        break
    }
}

The extracted data is later passed to the decoder. However, you don't allocate the buffer yourself. Instead you ask the decoder to give you out a buffer that he has currently available. I'll show how to get a buffer from MediaCodec decoder in the sections below.

Configuring Encoder & Input Surface

MediaCodec can use a Surface to operate on video data. You can create a Surface that can be used by a MediaCodec encoder by calling createInputSurface().

I will use this Surface together with OpenGL ES2. This will require some additional setup with EGL. With EGL I'll set up an OpenGL context for the current thread. Once I have the OpenGL context set up, I can use OpenGL calls to create a valid texture handle. This handle can later be used to create a SurfaceTexture, which can then be used to create output Surface for the decoder.

val mime = "video/avc"
val width = 320; val height = 180
val outFormat = MediaFormat.createVideoFormat(mime, width, height)
outFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
outFormat.setInteger(MediaFormat.KEY_BIT_RATE, 2000000)
outFormat.setInteger(MediaFormat.KEY_FRAME_RATE, 30)
outFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 15)
outFormat.setString(MediaFormat.KEY_MIME, mime)

// Init ecoder
encoder = MediaCodec.createEncoderByType(outFormat.getString(MediaFormat.KEY_MIME))
encoder.configure(outFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
inputSurface = encoder.createInputSurface()

In the code above I'm first configuring the output format that I want to receive from the encoder. You will probably want to configure some of this values (like, e.g. resolution) according to what you have in your input format that you got from MediaExtractor. I tried to show some fields that are required to make this work. As already mentioned, you should first query the codec capabilities and make sure that your requested format is supported (I'm showing how to check if resolution is supported in my previous post).

Note that I pass the CONFIGURE_FLAG_ENCODE flag and a null value for the Surface to the configure() method. After this the encoder will be able to give you the input Surface.

Next you need to use this input Surface to set up OpenGL context with the EGL API

eglDisplay = EGL14.eglGetDisplay(EGL14.EGL_DEFAULT_DISPLAY)
if (eglDisplay == EGL14.EGL_NO_DISPLAY)
    throw RuntimeException("eglDisplay == EGL14.EGL_NO_DISPLAY: "
            + GLUtils.getEGLErrorString(EGL14.eglGetError()))

val version = IntArray(2)
if (!EGL14.eglInitialize(eglDisplay, version, 0, version, 1))
    throw RuntimeException("eglInitialize(): " + GLUtils.getEGLErrorString(EGL14.eglGetError()))

val attribList = intArrayOf(
    EGL14.EGL_RED_SIZE, 8,
    EGL14.EGL_GREEN_SIZE, 8,
    EGL14.EGL_BLUE_SIZE, 8,
    EGL14.EGL_ALPHA_SIZE, 8,
    EGL14.EGL_RENDERABLE_TYPE, EGL14.EGL_OPENGL_ES2_BIT,
    EGLExt.EGL_RECORDABLE_ANDROID, 1,
    EGL14.EGL_NONE
)
val configs = arrayOfNulls<EGLConfig>(1)
val nConfigs = IntArray(1)
if (!EGL14.eglChooseConfig(eglDisplay, attribList, 0, configs, 0, configs.size, nConfigs, 0))
    throw RuntimeException(GLUtils.getEGLErrorString(EGL14.eglGetError()))

var err = EGL14.eglGetError()
if (err != EGL14.EGL_SUCCESS)
    throw RuntimeException(GLUtils.getEGLErrorString(err))

val ctxAttribs = intArrayOf(
    EGL14.EGL_CONTEXT_CLIENT_VERSION, 2,
    EGL14.EGL_NONE
)
val eglContext = EGL14.eglCreateContext(eglDisplay, configs[0], EGL14.EGL_NO_CONTEXT, ctxAttribs, 0)

err = EGL14.eglGetError()
if (err != EGL14.EGL_SUCCESS)
    throw RuntimeException(GLUtils.getEGLErrorString(err))

val surfaceAttribs = intArrayOf(
    EGL14.EGL_NONE
)

eglSurface = EGL14.eglCreateWindowSurface(eglDisplay, configs[0], inputSurface, surfaceAttribs, 0)

err = EGL14.eglGetError()
if (err != EGL14.EGL_SUCCESS)
    throw RuntimeException(GLUtils.getEGLErrorString(err))

if (!EGL14.eglMakeCurrent(eglDisplay, eglSurface, eglSurface, eglContext))
    throw RuntimeException("eglMakeCurrent(): " + GLUtils.getEGLErrorString(EGL14.eglGetError()))

Note that I'm using EGL_RECORDABLE_ANDROID for eglChooseConfig(). I didn't find much documentation about this flag, but it seems to be necessary for configuring a recordable surface.

The input Surface given by the recorder is then used directly in the eglCreateWindowSurface(). At the end I make the context current. This will allow me to call OpenGL functions. But I need to make sure, all calls are made from the same thread from which I called eglMakeCurrent().

Once you have a valid OpenGL context, you can use OpenGL together with a SurfaceTexture to create the output Surface for the MediaCodec decoder. I'll show how to do this in the next section.

Configuring Decoder & Output Surface

MediaCodec decoder also needs an output Surface. The Surface has a constructor that takes a SurfaceTexture as an argument. So to create our output Surface, you first create a SurfaceTexture and pass it to Surface's constructor

// Prepare a texture handle for SurfaceTexture
val textureHandles = IntArray(1)
GLES20.glGenTextures(1, textureHandles, 0)
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textureHandles[0])

surfaceTexture = SurfaceTexture(textureHandles[0])

// The onFrameAvailable() callback will be called from our HandlerThread
val thread = HandlerThread("FrameHandlerThread")
thread.start()

surfaceTexture.setOnFrameAvailableListener({
    synchronized(lock) {

        // New frame available before the last frame was process...we dropped some frames
        if (frameAvailable)
            Log.d(TAG, "Frame available before the last frame was process...we dropped some frames")

        frameAvailable = true
        lock.notifyAll()
    }
}, Handler(thread.looper))

outputSurface = Surface(surfaceTexture)

I created a texture name using standard OpenGL function which I then used to create a SurfaceTexture. Binding the texture id with GL_TEXTURE_EXTERNAL_OES allows us to use the frames from decoder as a texture inside of the fragment shader. You need to make sure that you call these OpengGL functions from the same thread to which you bound the context (after calling eglMakeCurrent()).

I set the OnFrameAvailableListener callback, so that I get notified when a new frame has been decoded. The callback is called from a different thread than for which I've configured my OpenGL context and I need to make sure that I don't touch the texture with the decoded frame and don't call any OpenGL functions directly from this callback. Therefore I'm using synchronization between threads and communicate when a new frame is available via a boolean variable (frameAvailable).

I can control the thread from which the callback will be called by creating my own HandlerThread.

At last I created an output Surface for my MediaCodec decoder. It's important to note that you need to keep the reference to SurfaceTexture the whole time you're using the output Surface, because passing it to Surface's constructor won't be enough for keeping it from being garbage collected.

Now I can create and configure my MediaCodec decoder

decoder = MediaCodec.createDecoderByType(inFormat.getString(MediaFormat.KEY_MIME))
decoder.configure(inputFormat, outputSurface, null, 0)

The inputFormat I got directly from MediaExtractor, as shown in previous section.

Decoding, Encoding, Muxing

The simplified algorithm that will perform the conversion to grayscale looks something like this

  1. Extract encoded frame from file (MediaExtractor)
  2. Decode frame (MediaCodec)
  3. Edit frame (OpenGL)
  4. Encode frame (MediaCodec)
  5. Save frame to final output video file (MediaMuxer)

I decided to basically perform all these steps on one thread (similar to the DecodeEditEncode CTS example). This makes everything much easier. Note, however, that encoding and decoding is still under the hood done in a separate process and "editing" with OpenGL ES is GPU-accelerated.

You could probably split some of the steps into separate threads and maybe try to use MediaCodec in async mode, but this will require additional synchronization and maybe also sharing/switching of OpengGL contexts. Reading and writing from persistent storage with MediaExtractor and MediaMuxer could also cause some latencies in case the buffering is insufficient (there seemed to be issues with MediaMuxer buffering and slow writing to sd card could cause the MediaCodec to run out of buffers and stall). So in that case, you also might need to implement additional thread-safe buffering.
The approach I chose was working as expected when I was testing it on my devices.

The first thing that I do here is to initialize the MediaMuxer and start the MediaCodec encoder and decoder

encoder.start()
decoder.start()
muxer = MediaMuxer("/path/to/out.mp4", MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)

Now I can perform the extracting, decoding, editing, encoding, and muxing

var allInputExtracted = false
var allInputDecoded = false
var allOutputEncoded = false

val timeoutUs = 10000L
val bufferInfo = MediaCodec.BufferInfo()
var trackIndex = -1

while (!allOutputEncoded) {
    // Feed input to decoder
    if (!allInputExtracted) {
        val inBufferId = decoder.dequeueInputBuffer(timeoutUs)
        if (inBufferId >= 0) {
            val buffer = decoder.getInputBuffer(inBufferId)
            val sampleSize = extractor.readSampleData(buffer, 0)

            if (sampleSize >= 0) {
                decoder.queueInputBuffer(
                    inBufferId, 0, sampleSize,
                    extractor.sampleTime, extractor.sampleFlags
                )

                extractor.advance()
            } else {
                decoder.queueInputBuffer(
                    inBufferId, 0, 0,
                    0, MediaCodec.BUFFER_FLAG_END_OF_STREAM
                )
                allInputExtracted = true
            }
        }
    }

    var encoderOutputAvailable = true
    var decoderOutputAvailable = !allInputDecoded

    while (encoderOutputAvailable || decoderOutputAvailable) {
        // Drain Encoder & mux to output file first
        val outBufferId = encoder!!.dequeueOutputBuffer(bufferInfo, timeoutUs)

        if (outBufferId >= 0) {
            val encodedBuffer = encoder!!.getOutputBuffer(outBufferId)

            muxer.writeSampleData(trackIndex, encodedBuffer, bufferInfo)

            encoder.releaseOutputBuffer(outBufferId, false)

            // Are we finished here?
            if ((bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                allOutputEncoded = true
                break
            }
        } else if (outBufferId == MediaCodec.INFO_TRY_AGAIN_LATER) {
            encoderOutputAvailable = false
        } else if (outBufferId == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
            trackIndex = muxer.addTrack(encoder.outputFormat)
            muxer.start()
        }

        if (outBufferId != MediaCodec.INFO_TRY_AGAIN_LATER)
            continue

        // Get output from decoder and feed it to encoder
        if (!allInputDecoded) {
            val outBufferId = decoder.dequeueOutputBuffer(bufferInfo, timeoutUs)
            if (outBufferId >= 0) {
                val render = bufferInfo.size > 0
                // Give the decoded frame to SurfaceTexture (onFrameAvailable() callback should
                // be called soon after this)
                decoder.releaseOutputBuffer(outBufferId, render)
                if (render) {
                    // Wait till new frame available after onFrameAvailable has been called
                    synchronized(lock) {
                        while (!frameAvailable) {
                            lock.wait(500)
                            if (!frameAvailable)
                                Log.e(TAG,"Surface frame wait timed out")
                        }
                        frameAvailable = false
                    }

                    surfaceTexture.updateTexImage()
                    surfaceTexture.getTransformMatrix(texMatrix)

                    // Render texture with OpenGL ES
                    // ...

                    EGLExt.eglPresentationTimeANDROID(eglDisplay, eglSurface, 
                        bufferInfo.presentationTimeUs * 1000)

                    EGL14.eglSwapBuffers(eglDisplay, eglSurface)
                }

                // Did we get all output from decoder?
                if ((bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                    allInputDecoded = true
                    encoder.signalEndOfInputStream()
                }

            } else if (outBufferId == MediaCodec.INFO_TRY_AGAIN_LATER) {
                decoderOutputAvailable = false
            }
        }
    }
}

The code above is a bit long. I was hoping it would be easier to copy/paste it that way. I'll try to give some additional details to make it a bit more understandable.

At the beginning of the main while loop, I first try to feed extracted frames from MediaExtractor to MediaCodec decoder.

Then inside of the inner nested while loop I first drain the encoder output and then drain decoder output. I'm draining the encoder and decoder in a loop until they don't have any more output. After that I can feed the encoder again with new input from extractor.

You can see that I'm checking for INFO_OUTPUT_FORMAT_CHANGED and then start the MediaMuxer and add a track with the corresponding format.

When all the data has been extracted (MediaExtractor.readSampleData() == -1), I queue an empty buffer to the decoder with BUFFER_FLAG_END_OF_STREAM flag to signal that we are done. This comes out at the other end of decoder where I again signal the end of stream to encoder by calling MediaCodec.signalEndOfInputStream(). signalEndOfInputStream() can only be called by an encoder with a Surface.

Note that I always first try to wait until new frame is available after onFrameAvailable has been called from a different thread. Only then I call SurfaceTexture.updateTexImage(), which will bind the frame texture to GL_TEXTURE_EXTERNAL_OES target and make it available inside of the shader. I'm also getting a transformation matrix from SurfaceTexture which should be used to transform the UVs of the texture inside of the shader (I'll show the usage in the next section). You need to make sure that updateTexImage() is called from the same thread that you bound your OpengGL context to.

I skipped the code that does the rendering of the texture with OpenGL. I'll show how to do the rendering in the next section. You can see the placeholder comment that shows the place where you can plug in the OpenGL calls.

After rendering I call eglPresentationTimeANDROID(). On some devices this might be necessary to propagate the timestamps to get proper video timing/length. The units are in nanoseconds as opposed to microseconds used by MediaCodec.

Calling EGL14.eglSwapBuffers() sends the buffer to encoder. This call can block when there's no input, so you should only call it when you've got some data from decoder.

Processing With OpenGL ES2

I created a separate section for the OpenGL stuff because it might contain quite a lot of steps and I was hoping I can make it more readable this way.

You can put the OpenGL initialization code right after where you've initialized the context with EGL. The rendering code should be called inside of the loop from previous section where I left a placeholder for it inside of my comment.

This is the actual section that you might want to customize for your requirements. In my post I'm only converting the frames to grayscale. I thought that this is simple enough to be used in a tutorial, but still a realistic task that an app might want to perform. You can adjust this to perform any kind of effects that you require.

I used the following code to initialize the OpenGL stuff

private val vertexShaderCode =
    """
    precision highp float;
    attribute vec3 vertexPosition;
    attribute vec2 uvs;
    varying vec2 varUvs;
    uniform mat4 texMatrix;
    uniform mat4 mvp;

    void main()
    {
        varUvs = (texMatrix * vec4(uvs.x, uvs.y, 0, 1.0)).xy;
        gl_Position = mvp * vec4(vertexPosition, 1.0);
    }
    """

private val fragmentShaderCode =
    """
    #extension GL_OES_EGL_image_external : require
    precision mediump float;

    varying vec2 varUvs;
    uniform samplerExternalOES texSampler;

    void main()
    {
        // Convert to grayscale here
        vec4 c = texture2D(texSampler, varUvs);
        float gs = 0.299*c.r + 0.587*c.g + 0.114*c.b;
        gl_FragColor = vec4(gs, gs, gs, c.a);
    }
    """

private var vertices = floatArrayOf(
    // x, y, z, u, v
    -1.0f, -1.0f, 0.0f, 0f, 0f,
    -1.0f, 1.0f, 0.0f, 0f, 1f,
    1.0f, 1.0f, 0.0f, 1f, 1f,
    1.0f, -1.0f, 0.0f, 1f, 0f
)

private var indices = intArrayOf(
    2, 1, 0, 0, 3, 2
)

private var program: Int
private var vertexHandle: Int = 0
private var bufferHandles = IntArray(2)
private var uvsHandle: Int = 0
private var texMatrixHandle: Int = 0
private var mvpHandle: Int = 0
private var samplerHandle: Int = 0
private val textureHandles = IntArray(1)

private var vertexBuffer: FloatBuffer = ByteBuffer.allocateDirect(vertices.size * 4).run {
    order(ByteOrder.nativeOrder())
    asFloatBuffer().apply {
        put(vertices)
        position(0)
    }
}

private var indexBuffer: IntBuffer = ByteBuffer.allocateDirect(indices.size * 4).run {
    order(ByteOrder.nativeOrder())
    asIntBuffer().apply {
        put(indices)
        position(0)
    }
}

...

init {
    // Create program
    val vertexShader: Int = loadShader(GLES20.GL_VERTEX_SHADER, vertexShaderCode)
    val fragmentShader: Int = loadShader(GLES20.GL_FRAGMENT_SHADER, fragmentShaderCode)

    program = GLES20.glCreateProgram().also {
        GLES20.glAttachShader(it, vertexShader)
        GLES20.glAttachShader(it, fragmentShader)
        GLES20.glLinkProgram(it)

        vertexHandle = GLES20.glGetAttribLocation(it, "vertexPosition")
        uvsHandle = GLES20.glGetAttribLocation(it, "uvs")
        texMatrixHandle = GLES20.glGetUniformLocation(it, "texMatrix")
        mvpHandle = GLES20.glGetUniformLocation(it, "mvp")
        samplerHandle = GLES20.glGetUniformLocation(it, "texSampler")
    }

    // Initialize buffers
    GLES20.glGenBuffers(2, bufferHandles, 0)

    GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, bufferHandles[0])
    GLES20.glBufferData(GLES20.GL_ARRAY_BUFFER, vertices.size * 4, vertexBuffer, GLES20.GL_DYNAMIC_DRAW)

    GLES20.glBindBuffer(GLES20.GL_ELEMENT_ARRAY_BUFFER, bufferHandles[1])
    GLES20.glBufferData(GLES20.GL_ELEMENT_ARRAY_BUFFER, indices.size * 4, indexBuffer, GLES20.GL_DYNAMIC_DRAW)

    // Init texture that will receive decoded frames
    GLES20.glGenTextures(1, textureHandles, 0)
    GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textureHandles[0])
    GLES20.glTexParameteri(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_MIN_FILTER,
        GLES20.GL_NEAREST)
    GLES20.glTexParameteri(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_MAG_FILTER,
        GLES20.GL_LINEAR)
    GLES20.glTexParameteri(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_WRAP_S,
        GLES20.GL_CLAMP_TO_EDGE)
    GLES20.glTexParameteri(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_WRAP_T,
        GLES20.GL_CLAMP_TO_EDGE)

    // Ensure I can draw transparent stuff that overlaps properly
    GLES20.glEnable(GLES20.GL_BLEND)
    GLES20.glBlendFunc(GLES20.GL_SRC_ALPHA, GLES20.GL_ONE_MINUS_SRC_ALPHA)
}

private fun loadShader(type: Int, shaderCode: String): Int {
    return GLES20.glCreateShader(type).also { shader ->
        GLES20.glShaderSource(shader, shaderCode)
        GLES20.glCompileShader(shader)
    }
}
...

The code is similar to the regular OpenGL initialization code from my other posts.

Note that I updated my fragment shader. It now uses the GL_OES_EGL_image_external with new sampler type - samplerExternalOES. I can access the decoded frame from MediaCodec decoder as a texture through this sampler type.

Inside of the init method I again created the texture handle that I'm using to create the SurfaceTexture from the previous sections of this post. This texture is bound to GL_TEXTURE_EXTERNAL_OES target.

For conversion to grayscale I used the same formula as in my previous post where I was writing about generating histogram with RenderScript.

Another thing to note is the texMatrixHandle through which I'm passing the transformation matrix that I got from SurfaceTexture.getTransformMatrix() (see previous tutorial section) to the vertex shader. I'm using the matrix to transform the UVs of the quad that holds the texture with our decoded frames.

The OpenGL drawing code called after every new frame that I received from the decoder is the following

GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT or GLES20.GL_DEPTH_BUFFER_BIT)
GLES20.glClearColor(0f, 0f, 0f, 0f)

GLES20.glViewport(0, 0, viewportWidth, viewportHeight)

GLES20.glUseProgram(program)

// Pass transformations to shader
GLES20.glUniformMatrix4fv(texMatrixHandle, 1, false, texMatrix, 0)
GLES20.glUniformMatrix4fv(mvpHandle, 1, false, mvpMatrix, 0)

// Prepare buffers with vertices and indices & draw
GLES20.glBindBuffer(GLES20.GL_ARRAY_BUFFER, bufferHandles[0])
GLES20.glBindBuffer(GLES20.GL_ELEMENT_ARRAY_BUFFER, bufferHandles[1])

GLES20.glEnableVertexAttribArray(vertexHandle)
GLES20.glVertexAttribPointer(vertexHandle, 3, GLES20.GL_FLOAT, false, 4 * 5, 0)

GLES20.glEnableVertexAttribArray(uvsHandle)
GLES20.glVertexAttribPointer(uvsHandle, 2, GLES20.GL_FLOAT, false, 4 * 5, 3 * 4)

GLES20.glDrawElements(GLES20.GL_TRIANGLES, 6, GLES20.GL_UNSIGNED_INT, 0)

Cleanup

Decoding, Editing, and Encoding a video file could take up quite a lot of resources. It might be a good idea not to wait for the GC and free the resources manually

thread.stop()

extractor.release()

decoder?.stop()
decoder?.release()
decoder = null

encoder?.stop()
encoder?.release()
encoder = null

muxer?.stop()
muxer?.release()
muxer = null

// Cleanup EGL stuff
if (eglDisplay != EGL14.EGL_NO_DISPLAY) {
    EGL14.eglDestroySurface(eglDisplay, eglSurface)
    EGL14.eglDestroyContext(eglDisplay, eglContext)
    EGL14.eglReleaseThread()
    EGL14.eglTerminate(eglDisplay)
}

surface?.release()
surface = null

eglDisplay = EGL14.EGL_NO_DISPLAY
eglContext = EGL14.EGL_NO_CONTEXT
eglSurface = EGL14.EGL_NO_SURFACE

Adding Audio Track

If your input video file that you want to convert to greyscale also contains an audio track, you also might want to embed this track into the final grayscale video.

How you do this depends on the input and output video format. In case your input video already contains audio format that can be directly muxed into the final output file, you don't need to use MediaCodec at all and you can feed data from MediaExtractor directly to MediaMuxer.

MediaMuxer might be picky about the format that it allows to be added to the output file. In case your audio format cannot be added directly, you will need to decode it and encode it again using MediaCodec.

I wrote a separate blogpost about how to add audio to video file, so I recommend to check it out to get more details.

Running in Background Service

Depending on the size and resolution of the video file, processing could take quite a lot of time. Running the code in a separate thread or async task spawned from your Activity is possible, but there might be some complications related to managing Activity lifecycle (e.g. during screen rotation) while avoiding creating memory leaks.

However, processing the video the way I described in this post should not require access to any UI elements. Therefore it does not have to be run from an Activity or Fragment. In my case I decided to execute the whole code inside of an IntentService.

I won't show the code for that in this post, but you can look at the sample code on github where you'll find the implementation.

Conclusion

Android offers standard APIs that allow you to efficiently decode, edit, and encode various video file formats. In this post I showed a quick example on how to use these APIs to convert an existing video file to grayscale.

The result looks something like this (I slightly changed the shader to convert only right half of the frame to grayscale)

Next Post Previous Post

Add a comment

Comments

Thanks ;). Hope I can find some time to write about additional stuff related to video processing on Android.
Written on Mon, 16 Sep 2019 09:25:29 by Roman
Very usefull tutorial
Written on Mon, 16 Sep 2019 07:34:24 by beranger