Collecting and Moving Video Metadata
some theory and practice about moving video metadata around
Video Metadata can move in mysterious ways. Here are some of the ways it can move and be collected.
- as a part of the video files
- as external files
- as part of a repository system
Metadata in external files: As we saw in one of the introductory sections, there are a number of standard metadata fields that can be included in most video containers. And Matroska files can contain str files as subtitle streams.
Metadata in external files: Subtitles are often included in .srt files. This is common with video containers that don't support subtitle streams or for additional supporting information about video files. Possibly most commonly seen with avi files in with torrents.
Metadata as part of a repository system: Rather than storting the metadata in a file it can be in a database of a content management systems.
Video sharing sites
Archiving sites using Pandora like 0xdb and pad.ma collect metadata about videos, and also make it available to query based on video identifier. They aim to have a more complete and flexible approach to how text and video interact. You can find out more by watching this Screen cast for Pad.ma
Moving Media Metadata
There are a great number of ways of moving the metadata from one repository to another. Sometimes it involves using the site specific API, sometimes they will make the the data available in an easily machine readable format. The most common machine formats are: MRSS and schema.org microdata. Microdata video object is partly read by Google.
Case Study - Drupal + Feeds for MRSS
As part of the Open Video Forum, there was a presentation of the aims of the transmission.cc website. Part of the aim of that website was to act as an aggregator and searchable archive of video metadata coming in from various types of RSS, Media RSS (MRSS) feeds. MRSS adds additional useful infomation to RSS feeds about video and audio files, including filesize, bitrate, width, heigth etc.
There is more information on the Media RSS specification here - http://www.rssboard.org/media-rss
Developer/installer overview for transmission aggregator
Transmission used the Drupal content management system as base to build on and extended the functionality already present in existing Drupal modules. All the code is standard Drupal and contrib with a couple of important additions.
The feeds are pulled with the http://drupal.org/project/feeds module. Feeds makes use of the simplepie parser to interperate the incoming feeds. Simplepie parses MRSS out of the box but feeds doesn't know what to do with this data.
To solve this you can use an extention module to read the MRSS output coming from the simplepie parser. This module is checked out as a submodule but is also now in a sandbox on drupal.org https://drupal.org/sandbox/ekes/1867408
In the transmission.cc site there are two content types for feeds: 'MRSS feed' and 'Video'. Posting a new 'MRSS feed' with the URL of the RSS feed adds that to the list of feeds that are pulled. Items in the feed are created as 'Video' nodes.
The node types, and the feeds settings to use them, and to map which parts of a feed item to which parts of the node can be set up in the Drupal interface of the feeds module.
This screen shot shows you a 'mapping' of a feed to the content pieces of your Drupal website.
To make the site itself output MRSS in the RSS feed there is another module, again included but now also in a sandbox. https://drupal.org/node/1867416
The complete transmission.cc code can be found at:
The repository uses git submodules to pull external code.
If you're on the command line to get the rest of the code you then type (in the git repository you just pulled):
$ git submodule init
$ git submodule update
If you're using your favourite gui there should be a way of getting all the submodules.
Replicating the transmission.cc configuration
If you want to replicate the settings for mapping the MRSS feeds, they are stored in a 'feature' [this is a way of storing settings in Drupal, the main module is http://drupal.org/project/features ].
The relevant feature for this is the one found in 'tx/modules/features/mrss_feeds'
Linking to files instead of pulling them
Transmission.cc pulls the actual files rather than linking to them. If you want to link to them or embed them externally, you can change the field mappings (see the image above) to use
emfield. Then you need not worry about the cron and transcoding sections of the tutorial.
Making thumbnails of remote files, ones you have not pulled, is much more involved than local files. So sometimes you will be missing an image if there was no thumbnail specified in the MRSS feed. Emfield will also not embed all formats or sources of video properly, so if this works will depend on your source video.
Downloading files as part of a cron job
As the transmission.cc configuration maps the files to a filefield they get downloaded. This is done as part of the queue-cron job (this is created by the https://drupal.org/project/job_scheduler which is a requirement of the feeds module.
Important note: if you run cron the standard way, by visiting http://site/cron.php you are making huge apache php processes. To avoid this you need at least use Drush to run
cron; as well as
Transcoding with media_mover
Transmission.cc transcodes the downloaded video using ffmpeg into files that can be easily embedded. The settings for this should be in the git repository install file. However the version of media_mover used did not export into features.
The configuration of media_mover in transmission.cc is however not the best method. Pulling big files, transcoding big files, even using Drush isn't the optimal solution. Currently the transmission.cc site struggles and occasionally gets stuck on big files.
The alternative is to use an alternative queue scheduler outside Drupal (examples include: rabbitmq, beanstalk, and redis with extra scripts).
Task - Show your understanding
- Write a comment outlining a possible use for moving video metadata (and files) between different repositories.
- For extra points write a short outline of the technologies you would use to do this.