Christopher Sung

When I started writing the video processing engine for what would become the BeatCam app, one of the aspects we wanted to control was the direction in which the video traveled, e.g. forward or reverse. Manipulating media in iOS involves the use of Apple’s AVFoundation library and its related classes that allow developers to read, write, and process video and audio data for a variety of use cases, so that became the obvious place to start.

The premise was to able to script these behaviors dynamically and make them available for users to apply to their videos to create experiences like this:

For our needs, it wasn’t going to be possible to extract video frames in real-time in the order needed to convey reverse movement. The only possible solution involved the creation of a companion video file that contained a reverse ordering of the video frames of the original. Also, since BeatCam processes input audio sources using a completely different pipeline from the video, we could conveniently ignore any audio in the original and simply output a silent, reversed version of the original.

A nice solution to this problem is outlined by Andy Hin in which an AVAssetReader can be used to read the input frames and their corresponding timestamps. One can keep an array of the frame buffers, and when it’s time to write them to file, you use the same original timestamps in order, but use frame data starting at the end of the frame array and append to the output in time descending order.

However, Andy’s actual implementation is geared more toward reversing short videos of 30s or less. Longer videos will cause memory usage to blow up since one can’t keep a large number of frames around in memory before starting to output to file. And in BeatCam, any video in your library or that you shoot with the camera is fair game, so any implementation would need to function independently of input video duration.

In addition, this processing within BeatCam needed to occur behind the scenes since the user could be occupied with other aspects of the UI, so it was imperative that it run in its own thread. And if this call was now asynchronous, then some kind of completion block or delegate callback would need to be implemented.

One possible solution to the memory issue is to perform an initial reconnaissance of the entire file with respect to timestamps and frames, to organize this data into an array of passes where each pass contains a maximum number of frames (in this case, 100), and to then process each pass by extracting just the frames in the given pass and writing to output.

The Recon Stage

// main array to hold presentation times
NSMutableArray *revSampleTimes = [[NSMutableArray alloc] init];

// now go through the reader output to get some recon on frame presentation times
CMSampleBufferRef sample;
int localCount = 0;
while((sample = [assetReaderOutput copyNextSampleBuffer])) {
  CMTime presentationTime = CMSampleBufferGetPresentationTimeStamp(sample);
  NSValue *presentationValue = [NSValue valueWithBytes:&presentationTime objCType:@encode(CMTime)];
  [revSampleTimes addObject:presentationValue];
  CFRelease(sample);
  
  localCount++;
}

This method works only because AVAssetReader has the ability to read from input file for a given time range with little cost in initial seek time to the beginning of the range. Thus we can use the array of input frame timestamps to create the main array of dictionaries that will define the passes.

Creating the Array of Pass Info

// each pass is defined by a time range which we can specify each time we re-init the asset reader
    
// array that holds the pass info
NSMutableArray *passDicts = [[NSMutableArray alloc] init];

NSValue *initEventValue = [revSampleTimes objectAtIndex:0];
CMTime initEventTime = [initEventValue CMTimeValue];

CMTime passStartTime = [initEventValue CMTimeValue];
CMTime passEndTime = [initEventValue CMTimeValue];

int timeStartIndex = -1;
int timeEndIndex = -1;
int frameStartIndex = -1;
int frameEndIndex = -1;

NSValue *timeEventValue, *frameEventValue;
NSValue *passStartValue, *passEndValue;
CMTime timeEventTime, frameEventTime;

int totalPasses = (int)ceil((float)revSampleTimes.count / (float)numSamplesInPass);

BOOL initNewPass = NO;
for (NSInteger i=0; i0) {
      passStartValue = [NSValue valueWithBytes:&passStartTime objCType:@encode(CMTime)];
      passEndValue = [NSValue valueWithBytes:&passEndTime objCType:@encode(CMTime)];
      NSDictionary *dict = @{
        @"passStartTime": passStartValue,
        @"passEndTime": passEndValue,
        @"timeStartIndex" : [NSNumber numberWithLong:timeStartIndex],
        @"timeEndIndex": [NSNumber numberWithLong:timeEndIndex],
        @"frameStartIndex" : [NSNumber numberWithLong:frameStartIndex],
        @"frameEndIndex": [NSNumber numberWithLong:frameEndIndex]
      };
      [passDicts addObject:dict];
    }
    initNewPass = YES;
  }
  
  // if new pass then init the main vars
  if (initNewPass) {
    passStartTime = timeEventTime;
    timeStartIndex = (int)i;
    frameStartIndex = (int)(revSampleTimes.count - 1 - i);
    initNewPass = NO;
  }
}

// handle last pass
if ((passDicts.count < totalPasses) || revSampleTimes.count%numSamplesInPass != 0) {
  passStartValue = [NSValue valueWithBytes:&passStartTime objCType:@encode(CMTime)];
  passEndValue = [NSValue valueWithBytes:&passEndTime objCType:@encode(CMTime)];
  NSDictionary *dict = @{
    @"passStartTime": passStartValue,
    @"passEndTime": passEndValue,
    @"timeStartIndex" : [NSNumber numberWithLong:timeStartIndex],
    @"timeEndIndex": [NSNumber numberWithLong:timeEndIndex],
    @"frameStartIndex" : [NSNumber numberWithLong:frameStartIndex],
    @"frameEndIndex": [NSNumber numberWithLong:frameEndIndex]
  };
  [passDicts addObject:dict];
}

For each pass, a new AVAssetReader is initialized and the time range to be read is set to the subset of frames relating to the current pass.

Passes are read in reverse order, e.g. last pass to first, and in each pass, frames are written from the end of the pass to the start, which effectively writes all frames in the file in reverse order with capped memory usage (the number of frames in a pass) regardless of the length of the input video. This means the last pass will contain the total number of frames modulo the number of frames per pass. For example, if the input video has 456 total frames, and we are processing 100 frames per pass, then we will have 5 total passes, with the last pass containing 56 frames. This last pass will be the first one written to output, followed by 4 more of 100 frames a piece.

Using the Pass Array to Write the Output

int frameCount = 0; // master frame counter
int fpsInt = (int)(fps + 0.5);

// now go through the read passes and write to output
for (NSInteger z=passDicts.count-1; z>=0; z--) {
  NSDictionary *dict = [passDicts objectAtIndex:z];
  
  passStartValue = dict[@"passStartTime"];
  passStartTime = [passStartValue CMTimeValue];
  
  passEndValue = dict[@"passEndTime"];
  passEndTime = [passEndValue CMTimeValue];
  
  CMTime passDuration = CMTimeSubtract(passEndTime, passStartTime);
  
  int timeStartIx = (int)[dict[@"timeStartIndex"] longValue];
  int timeEndIx = (int)[dict[@"timeEndIndex"] longValue];
  
  int frameStartIx = (int)[dict[@"frameStartIndex"] longValue];
  int frameEndIx = (int)[dict[@"frameEndIndex"] longValue];
  
  CMTimeRange localRange = CMTimeRangeMake(passStartTime,passDuration);
  NSValue *localRangeValue = [NSValue valueWithBytes:&localRange objCType:@encode(CMTimeRange)];
  NSMutableArray *localRanges = [[NSMutableArray alloc] init];
  [localRanges addObject:localRangeValue];
  
  // reset the reader to the range of the pass
  [assetReaderOutput resetForReadingTimeRanges:localRanges];
  
  // read in the samples of the pass
  NSMutableArray *samples = [[NSMutableArray alloc] init];
  while((sample = [assetReaderOutput copyNextSampleBuffer])) {
    [samples addObject:(__bridge id)sample];
    CFRelease(sample);
  }
  
  // append samples to output using the recorded frame times
  for (NSInteger i=0; i= revSampleTimes.count) {
      NSLog(@"%s pass %ld: more samples than recorded frames! %d >= %lu ", __FUNCTION__, (long)z, frameCount, (unsigned long)revSampleTimes.count);
      break;
    }
    
    // get the orig presentation time (from start to end)
    NSValue *eventValue = [revSampleTimes objectAtIndex:frameCount];
    CMTime eventTime = [eventValue CMTimeValue];
    
    // take the image/pixel buffer from tail end of the array
    CVPixelBufferRef imageBufferRef = CMSampleBufferGetImageBuffer((__bridge CMSampleBufferRef)samples[samples.count - i - 1]);
    
    // append frames to output
    BOOL append_ok = NO;
    int j = 0;
    while (!append_ok && j < fpsInt) {
      
      if (adaptor.assetWriterInput.readyForMoreMediaData) {
        append_ok = [adaptor appendPixelBuffer:imageBufferRef withPresentationTime:eventTime];
        if (!append_ok)
          NSLog(@"%s Problem appending frame at time: %lld", __FUNCTION__, eventTime.value);
      }
      else {
        // adaptor not ready
        [NSThread sleepForTimeInterval:0.05];
      }
      
      j++;
    }
    
    if (!append_ok)
      NSLog(@"%s error appending frame %d; times %d", __FUNCTION__, frameCount, j);
    
    frameCount++;
  }
  
  // release the samples array for this pass
  samples = nil;
}

You can see this in action in the app’s video filters as exemplified by some of the posts on BeatCam’s Instagram page

The class used for creating reverse video, along with a sample XCode project is available on Github. I hope you find it useful.

—

One aside: if you do find that you need to include a reversed version of the audio, then you will need to extract the audio from the video, reverse the order of the audio samples (make sure you take mono vs stereo into account), and then export a new video using the newly created reversed video and audio tracks.

The export, in this case, is achieved by creating a new AVMutableComposition, defining the tracks, length, and other details, and then exporting to file using AVAssetExportSession. Consult Apple’s documentation for a full explanation of how this is achieved.

Christopher Sung

Musings on Engineering, Music, Family, and more

Where is the Fox?

Auth, Identity, and Single Sign-On in Practice

GraphQL Federation, Domain Autonomy, and You

Reverse Video in iOS