Thursday, July 24, 2014

Forking Around

For a long while now, I've been thinking about how to write a perl script to manage transcoding raw movie files to something I can stream. The idea is to dump the raw files into a folder and have the script analyze each file and transcode it to a specific format based on resolution. I use ffmpeg for transcoding and because of the limitations in ffmpeg regarding multi-threading I usually run six transcodes at once to make use of most of the cores on the system. So, to maximise the use of system resources for transcoding tasks, and finish all the tasks as quickly as possible, my perl script will need to keep track of how many ffmpeg instances I'm running. I thought the best way to do that was to use fork.

Fork allows the perl script to create a child process, basically a duplicate of the original script. When we tell the script to fork, the fork process sets a variable, in the child process that variable is set to “0”, in the parent process the variable is set to the systems process ID for the child process. Using that information we can use an “if” statement to have the child do one thing, and the parent do a different thing. The basic form of the if statement is as follows:

$forkPID = fork;  #remember fork sets the variable to "0" in the child 
if ($forkPID == 0) { 
 # this is the child so it's going to do some child stuff
 do childStuff;
} else {
 # this is the parent so it's going to do some parent stuff
 do parentStuff;
}

Once the script is forked, there are a couple of options, the child process is going to do it's thing, and the parent process can either go on it's merry way until it completes, or it can wait for the child process to finish. Since I want to run six transcodes at a time, I need to generate six child processes, and then wait for any one of them to finish before starting a new transcode. To do that, I need to iterate through all the transcode items, and for each of the transcode items run a test to see if there are few enough already going that I can start a new one. The test is simply while keeping track of the number of child processes I ask if there are fewer than six. If there are fewer than six, I start a new one, if there are six or more I wait for one of them to finish before starting a new one.

Once all the transcodes have started, and all but six have ended, I need to wait for the last six to finish before I can exit out of the parent process.

Below is the test script I built to test forking and limiting to six children. The subroutine I call at the bottom is just a stub that sleeps for a random number of seconds between ten and one-hundred.

#!/usr/bin/perl

use strict;
use warnings;

our $numberOfChildren = 0; # Keeps track of the current number of children.
our @transcodesArray = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17);

# We want to itterate through the array and launch a child process 
# for each element, but we never want more than 6 children at any given time.

foreach my $transcodeItem (@transcodesArray) {
 print "This is transcode item $transcodeItem. \n";
 print "Number of Children is $numberOfChildren. \n";
 if ( $numberOfChildren < 6) {
  my $forkID = fork;
  if ($forkID == 0) {
   sub1($transcodeItem);
            exit 0;
  } else {
   $numberOfChildren += 1;
  }
 } else {
  my $child = waitpid -1, 0;
  my $localtime = localtime;
  print "Parent: Child $transcodeItem with PID $child was reaped - $localtime.\n";
  my $forkID = fork;
  if ($forkID == 0) {
   sub1($transcodeItem);
            exit 0;
  }
 }
}

# Once all the items in the array have spawned a child, we need to wait for
# the remaining children to finish before we exit the program.
while ($numberOfChildren > 0) {
 print "Number of Children is $numberOfChildren. \n";
 my $child = waitpid -1, 0;
 my $localtime = localtime;
 print "Parent: Child $child was reaped - $localtime.\n";
 $numberOfChildren -= 1;
}

# Simple subroutine to make the child process run a random length of time
# before the process ends. 
sub sub1 {
        my $num = shift;
        print "started child process for  $num my pid is $$\n";
  my $rand = 10 * (1 + int(rand(10)));
        sleep $rand;
        print "done with child process for $num\n";
        return $num;
}