Android natively offers APIs that enable video processing and various video effects. I used some of these APIs already in my previous projects. MediaCodec and OpenGL ES are the main parts that enabled video processing.

I'm planning to use MediaCodec and OpenGL ES also in my future video editing projects and one of the features that I've found would be cool to implement is reversing video, so that it can be played backwards. I know that it's possible to implement this feature using ffmpeg or some other external library, but I was curious how to do this using only the standard Android APIs.

In this post I describe my experience implementing this feature. I created a little test project that allows to reverse the video in background service. You can find the sample project at github.

If you're interested in video processing using only Android's standard API's (especially MediaCodec), I also recommend to check out my other posts

Adding text or other textures to existing video

Creating video from images

Adding audio to video

Converting video to greyscale

If you you're an Android enthusiast that likes to learn more about Android internals, I highly recommend to check out my Bugjaeger app. It allows you to connect 2 Android devices through USB OTG and perform many of the tasks that are normally only accessible from a developer machine via ADB directly from your Android phone/tablet.

High-Level Overview

I used an approach similar to my previous posts. I again used MediaExtractor to get the encoded frames from a video file and I used MediaCodec to encode the frames in reverse order.

This time I didn't implement any special effects, so I didn't need to use OpenGL. This also made Surface initialization easier because I didn't need to use EGL for surface initialization and I just use the surface handed over from MediaCodec encoder.

MediaExtractors offers the seekTo which was the key method to implement the algorithm for reverting video. The simplified steps to revert the video are the following 1.) Seek to the end of video file (MediaExtractor) 2.) Seek backwards frame by frame (MediaExtractor) and feed the frames to decoder (MediaCodec) 3.) Re-encode the frames coming in backwards order again (MediaCodec) 4.) Mux the frames into final video output file (MediaMuxer)

As you'll see later, there are couple of problems with this approach. I'll try to show how I actually implemented this feature successfully in the next sections.

If you see some additional issues with my approach, or there's something that could be implemented more efficiently, feel free to comment or send pull requests directly to my github project.

Backwards Seeking With MediaExtractor

MediaExtractor allows to seek to specific sample at a given presentation timestamp. The seekTo accepts a timestamp in micro seconds and a flag that specifies the seek mode.

My first problem with seeking was that seeking didn't work reliably, if I didn't know the exact timestamp of the sample upfront.

My expectation was that if I use the SEEK_TO_PREVIOUS_SYNC flag and I decrement the current timestamp slightly, MediaExtractor will automatically find the previous timestamp. It's possible that I've misread the documentation or made some other mistake, but that approach wasn't working and extractor was behaving the same way as if I would use the SEEK_TO_CLOSEST_SYNC.

So I decided to first go frame by frame by using the MediaExtractor.advance() method, check if current frame is a key frame, and store the exact presentation times of all keyframes in a list. This allowed me to seek backwards by popping the exact seek times from this pre-loaded list.

The second problem is related to how video encoding and seeking works. MediaExtractor doesn't seek to any kind of sample. It seeks only to a sync sample (or key frame in MediaCodec terminology - a sync frame seems to be a special key frame that also contains video configuration change parameters). I'll show how I tried to solve this issue in the next section.

In this post I'll skip some of the initialization code for MediaExtractor and MediaCodec because I already showed it in my previous posts. Keep in mind that before using MediaExtractor, you should first find and select the video track inside of your input video file. I used the following code to seek to the end of file and collect the timestamps of all key frames

val syncSampleTimes = Stack<Long>()

while(true) {
    if (extractor!!.sampleFlags == MediaExtractor.SAMPLE_FLAG_SYNC)
        syncSampleTimes.push(extractor!!.sampleTime)

    if (!extractor!!.advance())
        break
}

Once I know the presentation time of my key frames, I can seek backwards by popping the collected times from top of the stack

val next = syncSampleTimes.pop()

extractor!!.seekTo(next, MediaExtractor.SEEK_TO_CLOSEST_SYNC)

Encoding Video Backwards

I again decided to use a Surface with MediaCodec for decoding and encoding. This time I'm not modifying the content of the frames, so I don't need OpengGL and EGL, and I can use a single Surface with the decoder and encoder.

If you also would like to add some additional effects, check out my previous post.

It's also possible to use MediaCodec with a ByteBuffer, but the documentation suggests to use a Surface for better performance. However, in this case the ByteBuffer would allow us to access the frames and re-order them manually in code. This way we could skip the step that I'll show in the next section. I didn't test any version with ByteBuffer because I was assuming that a Surface would still be more efficient.

To initialize MediaCodec decoder and encoder with a single surface, you can call MediaCodec.createInputSurface() on an encoder. This will give you a Surface that you can pass to decoder's configure() method.

You can use the following code to initialize MediaCodec decoder/encoder and all additional related classes

// Init extractor
val extractor = MediaExtractor()
extractor.setDataSource(inputVidFd)

var inFormat: MediaFormat? = null
for (i in 0 until extractor.trackCount) {
    val format = extractor.getTrackFormat(i)
    if (format.getString(MediaFormat.KEY_MIME).startsWith("video/")) {
        extractor.selectTrack(i)
        inFormat = format
        break
    }
}

// Create H.264 encoder
val mime = "video/avc"
val encoder = MediaCodec.createEncoderByType(mime)

// Prepare output format for encoder
val width = inFormat!!.getInteger(MediaFormat.KEY_WIDTH)
val height = inFormat.getInteger(MediaFormat.KEY_HEIGHT)

val outFormat = MediaFormat.createVideoFormat(mime, width, height).apply {
    setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
    setInteger(MediaFormat.KEY_BIT_RATE, 20000000)
    setInteger(MediaFormat.KEY_FRAME_RATE, inFormat.getInteger(MediaFormat.KEY_FRAME_RATE)            )
    setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 15)
    setString(MediaFormat.KEY_MIME, mime)
}

// Configure encoder
encoder.configure(outFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
val surface = encoder.createInputSurface()

// Init decoder
val decoder = MediaCodec.createDecoderByType(inFormat.getString(MediaFormat.KEY_MIME))
decoder.configure(inFormat, surface, null, 0)

// Init muxer
val muxer = MediaMuxer(outPath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)

In the code above I've initialized the extractor, decoder, encoder, and muxer. I used the Surface provided by encoder to configure the decoder.

I used the same width and height as the input video to configure the output format. Here you should check if the width and height is actually supported. I showed how to do this in the github sample. You should be at least able to use resolutions mentioned in Compatibility Definitions Documents.

Before reverting the video, I first seek to the end and collect key frame timestamps as mentioned in the previous section

val syncSampleTimes = Stack<Long>()

while(true) {
    if (extractor.sampleFlags == MediaExtractor.SAMPLE_FLAG_SYNC)
        syncSampleTimes.push(extractor.sampleTime)

    if (!extractor.advance())
        break
}

val endPresentationTimeUs = syncSampleTimes.lastElement()

In the following code I do the actual processing of the video. The following chunk is a bit large because I wanted to put everything into one place to make it easier to copy-paste. Feel free to break it apart

encoder.start()
decoder.start()

var allInputExtracted = false
var allInputDecoded = false
var allOutputEncoded = false

var trackIndex = -1
val mediaCodedTimeoutUs = 10000L
val bufferInfo = MediaCodec.BufferInfo()

// Extract, decode, edit, encode, and mux
while (!allOutputEncoded) {
    // Feed input to decoder
    if (!allInputExtracted) {
        val inBufferId = decoder.dequeueInputBuffer(mediaCodedTimeoutUs)
        if (inBufferId >= 0) {
            if (syncSampleTimes.isNotEmpty() && syncSampleTimes.peek() > 0) { // If we're not yet at the beginning
                val buffer = decoder.getInputBuffer(inBufferId)
                val sampleSize = extractor.readSampleData(buffer, 0)
                if (sampleSize > 0) {
                    decoder.queueInputBuffer(
                        inBufferId, 0, sampleSize,
                        endPresentationTimeUs - extractor.sampleTime, extractor!!.sampleFlags
                    )
                }

                val next = syncSampleTimes.pop()

                extractor!!.seekTo(next, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
            } else {
                decoder.queueInputBuffer(inBufferId, 0, 0,
                    0, MediaCodec.BUFFER_FLAG_END_OF_STREAM)
                allInputExtracted = true
            }
        }
    }

    var encoderOutputAvailable = true
    var decoderOutputAvailable = !allInputDecoded

    while (encoderOutputAvailable || decoderOutputAvailable) {
        // Drain Encoder & mux to output file first
        val outBufferId = encoder.dequeueOutputBuffer(bufferInfo, mediaCodedTimeoutUs)
        if (outBufferId >= 0) {
            val encodedBuffer = encoder.getOutputBuffer(outBufferId)

            muxer.writeSampleData(trackIndex, encodedBuffer, bufferInfo)

            encoder.releaseOutputBuffer(outBufferId, false)

            // Are we finished here?
            if ((bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                allOutputEncoded = true
                break
            }
        } else if (outBufferId == MediaCodec.INFO_TRY_AGAIN_LATER) {
            encoderOutputAvailable = false
        } else if (outBufferId == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
            trackIndex = muxer.addTrack(encoder.outputFormat)
            muxer.start()
        }

        if (outBufferId != MediaCodec.INFO_TRY_AGAIN_LATER)
            continue

        // Get output from decoder and feed it to encoder
        if (!allInputDecoded) {
            val outBufferId = decoder.dequeueOutputBuffer(bufferInfo, mediaCodedTimeoutUs)
            if (outBufferId >= 0) {
                val render = bufferInfo.size > 0

                // Get the decoded frame
                decoder.releaseOutputBuffer(outBufferId, render)

                // Did we get all output from decoder?
                if ((bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                    allInputDecoded = true
                    encoder.signalEndOfInputStream()
                }
            } else if (outBufferId == MediaCodec.INFO_TRY_AGAIN_LATER) {
                decoderOutputAvailable = false
            }
        }
    }
}

decoder.stop()
encoder.stop()
muxer.stop()

The code above is similar to the code I've used in my previous posts.

The main difference is that now I'm using MediaExtractor.seekTo() method to seek backwards to the key frame timestamps I've collected previously (instead of the MediaExtractor.advance() method).

Additionally, this time I'm using only one Surface and I'm not using EGL to swap the buffers.

Note that I modified the timestamp passed to decoder when enqueueing new input buffers. I used the last timestamp (endPresentationTimeUs) to calculate the time in reverse order.

The code above produces the result I'm showing bellow. You can see that there are some issues with the video. I'll show how to fix it in the next section.

Issues With Sync Samples

MediaExtractor seeks only to sync samples, therefore when using the above technique, there are frames missing in between and the final video is therefore not smooth.

The main problem is related to inter frame video compression. Some types of frames require other frames (reference frames) for decoding. That often means that you need previous frames to decode future frames. And this is a problem if you try to decode the frames in reverse order.

The frames that can be decoded without requiring other frames are called I-frames. In the official docs, sync samples (MediaExtractor), key frames (MediaCodec), and I-frames seem to mean, more or less, the same thing.

One solution for this would be to decode a bunch of frames (starting from the end) in normal order and then change the order of the frames in the decoded chunk. I would need to hold at least a couple of decoded frames in memory (the memory requirements can grow very quickly - 1080p ~ 1920 * 1080 * 4 * 30 FPS * n seconds). I didn't test this solution, but it looks like it would require to use ByteBuffer instead of Surface. At least I didn't see an efficient way to touch the frames passed through a Surface in Kotlin code.

I decided to solve this problem by converting all frames to key frames first. And then to use the method described in the previous section to reverse the video.

This means that there is an additional conversion pass necessary to create an intermediate video file consisting only from key frames, which slows down the whole process. This intermediate video file is of course also larger than the original. But I can at least use a Surface for more efficient use of video buffers.

To encode all frames as key frames, I had to modify the output format passed to encoder, and pass a zero value with KEY_I_FRAME_INTERVAL key.

...
val outFormat = MediaFormat.createVideoFormat(mime, width, height).apply {
    setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
    setInteger(MediaFormat.KEY_BIT_RATE, 20000000)
    setInteger(MediaFormat.KEY_FRAME_RATE, inFormat.getInteger(MediaFormat.KEY_FRAME_RATE))

    // Specify 0 to make all frames to key frames
    setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 0)
    setString(MediaFormat.KEY_MIME, mime)
}
...

All steps for this intermediate pass stayed basically the same, except I extracted the frames with MediaExtractor in regular order first (using MediaExtractor.advance()).

Once you a have an intermediate video file consisting only from key frames, you can use the code in the previous section to make it play backwards.

Previous Post

Add a comment

Comments

It would be nice to see the "One solution for this would be to decode a bunch of frames..." part. What I currently do is to seek back to the previous sync frame and then seek forward to the desired frame. You can imagine that this is not performing that great, especially when the video has very few sync frames.
Written on Fri, 25 Sep 2020 07:20:57 by Hagen Brooks