Last spring I was teaching Introduction to Computer Vision at Drexel when COVID-19 hit and, well, everyone had to adjust quickly. While my course always had an online section, I decided to make all of my lectures asynchronous in to provide maximum flexibility to students (and myself).

After some prep, I booted up quicktime on my mac and screen-recorded my first lecture at 1080p. Even though quicktime encoded using H.264, the file size was massive: nearly 5 GB for only an hour lecture.

No problem, I figured, probably just a high bit rate and lack of b-frames. I re-encoded it using two-pass x264 but found the result was still hundreds of megabytes.

You might be wondering why this is a problem. Most videos are hundreds of megabytes, right? Well, I had two issues here:

  • I needed to ensure all of my students could watch the videos, and knew many of them may be in locations with poor internet access.
  • A lecture was only my slides. I average about 1 per minute, so surely I only needed 60 frames of video which should be very, very small.

After some googling around, I found x264 had exactly a feature for such videos: Constant Rate Factor (CRF) encoding. With CRF you instruct the encoder to target a provided quality level throughout the encoding. This is great for low-motion videos like lecture slides. Since there is very little movement the CRF encoder can keep a very low bitrate and result in an extremely small file size. I have the occasional video or animation in my slides and CRF simply increases the bitrates at those times.

On top of that, I implemented a few more tricks such as:

  • Telling x264 to optimize still images
  • Reducing the frame rate to 10fps
  • Adding fast start flags to the output file to make streaming easier and ensure compatibility with different devices.

My final ffmpeg command is:

ffmpeg -i -pix_fmt yuv420p -c:v libx264 -crf 18 -preset veryslow -tune stillimage -r 10 -acodec aac -b:v 48k -f mp4  -movflags +faststart OUTPUT.mp4

This resulting files average less than a MB per minute, most of which is audio, but have hardly any artifacts in the video.