Using ffmpeg and avconv [Jan. 14, 2013, 5:25 a.m.]

Using ffmpeg and avconv

A hands on look at command line encoding. While this chapter was always going to be challenging to write, we think we have some good materials here which act as a base to look at the subject.

ffmpeg and avconv (a more updated version of ffmpeg) are the command line applications that are working in the back end of many desktop encoding applications. These include ffmpegX, Handbrake, SUPER encoder. It is also essential for many desktop video editing applications including Kdenlive.

These tools can also be a very handy to work with video on Internet servers. As you will see later in the chapter you will be able to do more than just transcode video from one file type to another.

Installing ffmpeg / avconv

To start playing around with FFmpeg you will need to install it. It is best suited to use of a linux based server or desktop but it is possible on windows and osx. A search engine can help you with the specifics of how to install it. Search for "install ffmpeg + your operating system"

WebM settings

For an overview of usefull ffmpeg/avconv options related to encoding WebM files. http://wiki.webmproject.org/ffmpeg and http://ffmpeg.org/ffmpeg.html#libvpx

http://rodrigopolo.com/ffmpeg/cheats.html

H264 settings

For a guide on creating h264 files that work on many mobile devices - http://h264.code-shop.com/trac/wiki/Encoding. For a general x264 encoding guide see - http://ffmpeg.org/trac/ffmpeg/wiki/x264EncodingGuide

ffmpeg and numpy

This guide written by RMO is a good introduction to taking your work with ffmpeg to another level on the server.

In the past year and a half, daf and I have undertaken a series of media experiments using python's excellent numpy library. The outcome of these trials are largely encapsulated in our numm project, which is available in the debian and ubuntu repositories as python-numm.

Numm uses gstreamer for a/v decoding and encoding, as well as a minimalist livecoding API, but in the interests of simplicity and portability, I've been reimplementing some of the core functionality as a wrapper around the ffmpeg binary.

loading a video as numpy arrays

import numpy as np
import subprocess

def video_frames(path, width=320, height=240, fps=30):
    cmd = ['ffmpeg', '-i', path,
           '-vf', 'scale=%d:%d'%(width,height),
           '-r', str(fps),
           '-an',
           '-c:v', 'rawvideo', '-f', 'rawvideo',
           '-pix_fmt', 'rgb24',
           '-']
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    while True:
        arr = np.fromstring(p.stdout.read(width*height*3), dtype=np.uint8)
        if len(arr) == 0:
            p.wait()
            return

        yield arr.reshape((height, width, 3))

saving frames as images

Our video_frames function gives us numpy buffers from a video file. Each buffer is a 3-d array -- height, width, color -- with 8-bit intensity values. To save numpy buffers as images, we use the Python Imaging Library to define a np2image function (from image.py):

import Image
def np2image(np, path):
    im = Image.fromstring(
        'RGB', (np.shape[1], np.shape[0]), np.tostring())
    im.save(path)

For example, to save a video as a directory full of still images:

for idx, fr in enumerate(video_to_frames(path)):
    np2image(fr, '%06d.jpg' % (idx))

encoding numpy arrays to video

Alternatively, we can write a series of frames back to disk as a video:

def frames_to_video(generator, path, fps=30, ffopts=[]):
    p = None
    for fr in generator:
        if p is None:
            cmd =['ffmpeg', '-y', '-s', '%dx%d' % (fr.shape[1], fr.shape[0]),
                  '-r', str(fps),
                  '-an',
                  '-c:v', 'rawvideo', '-f', 'rawvideo',
                  '-pix_fmt', 'rgb24',
                  '-i', '-'] + ffopts + [path]
            p = subprocess.Popen(cmd, stdin=subprocess.PIPE)
        p.stdin.write(fr.tostring())
    p.stdin.close()
    p.wait()

a simple test

Assuming you've saved these functions to a file (see here) and imported them, we can re-encode a video:

frames_to_video(
    video_to_frames('/path/to/input/video.avi'),
    '/path/to/output/video.webm')

And ffmpeg encoding parameters can be added as needed, though this is not an efficient or in any way recommended method to transcode videos.

frames_to_video(
    video_to_frames('/path/to/input/video.avi'),
    '/path/to/output/video.webm', ffopts=['-vb', '500K']) # &c.

video synopses

Here are a few quick examples of the processing you can do by thinking about video as a series of arrays.

composite image

The average frame in a video:

INPUT_VIDEO = '/path/to/video'
W, H = (320, 240)

comp = np.zeros((H, W, 3), dtype=int)
nframes = 0
for fr in video_frames(INPUT_VIDEO, width=W, height=H):
    comp += fr
    nframes += 1
comp = (comp / nframes).astype(np.uint8)
np2image(comp, INPUT_VIDEO + '-comp.png')

scans

Slitscans and 0xScans are pixel-wide sweeps through a video:

slits = []
oxscan = []
for fr in video_frames(INPUT_VIDEO, width=W, height=H):
    slits.append(fr[:,W/2])
    oxscan.append(fr.mean(axis=1).astype(np.uint8))
slits = np.array(slits).transpose(1,0,2)
oxscan = np.array(oxscan).transpose(1,0,2)
np2image(slits, INPUT_VIDEO + '-slitscan.png')
np2image(oxscan, INPUT_VIDEO + '-oxscan.png')