Scripting

Good Habits (aka "Scripting is Programming")

Some people make a distinction between real programming and scripting but it's a false dichotomy. If a computer runs it, it's a program and that means it's worth taking seriously:

Learn your tools

This doesn't mean memorizing the function reference but rather knowing the basic syntax, idioms and how to find information as you need it (help, perldoc, pydoc, etc.). The applies particularly to other people's code: if you're copying an example or taking over someone's program you'll avoid future grief by taking the time to learn about anything you don't understand.

Python

Perl

Bash

Vim

Trust Nothing

This gets a lot of attention as a security measure but it's even more important for robustness: any time you assume something you are introducing an opportunity for a bug.

Because you're protecting against many different things the answer is to code defensively:

Don't Repeat Yourself

Rule of thumb: any time you're about to copy code substantially the same code it's a sign that you want to move that code into a common location whether that is a common function, class or even a separate program. The latter case is surprisingly easy to forget about: Unix rewards creating small, simple tools and it's trivial to chain them together.

A simple example: ~sysadm/bin/inline-resolver was originally developed for some cron emails. Since then I've used it for many different types of log files, reports and a few config files - a handy return on the initial 10 minute investment.

Code is documentation

Most programs will never be worth writing separate documentation for and even for the ones which are, that documentation will tend to drift out of date. A more pragmatic approach is to make the code more explanatory:

Making your programs explorable

It's probably not necessary to write a separate man page but it's pretty easy to make sure your program supports a --help option with some information. The Python optparse module (see below) makes this quite easy; Perl has Pod::Usage which is slightly more work but allows you to generate a full manpage.

Knowing the limits of the Unix shell

There are a number of guides for shell usage (e.g. Bash Pitfalls) and half of the tips are various techniques avoiding problems with embedded spaces and meta-characters, excessive quoting, etc.

Bad:

for foo in `ls -1 *.mp3`; do echo $foo; done

Good:

for foo in *.mp3; do echo "$foo"; done
find . -type \*.mp3 -print0 | xargs -0 rm
TIME_T=`date +%s`

Better:

find . -type \*.mp3 -delete
TIME_T=$(date +%s)

Modern shells support functions and it's always a good idea to move common code into functions, particularly because that makes it easy to add better error-handling without cluttering up the main body of your code:

essh() {
    CMD="$@"
    if [ ! -z "$CMD" ]; then
        CMD="ssh $CMD"
    fi

    ssh -tA 198.202.70.23 $CMD
}

Once you need to do more than minimal string processing or basic functions it makes sense to switch to a real programming language and avoid the clutter of shell-isms. If your program isn't trivial the richer syntax and libraries will quickly eat up the time spent switching to a real programming language.

Logging

You're going to want to know what any non-trivial program is doing and it's nice not to have to uncomment a bunch of print statements when that happens. Taking logging seriously from the beginning offers several advantages:

  1. Variable log levels: you frequently need to distinguish between messages which are only useful when debugging the program, messages which indicate progress and other non-critical information and failures or errors which you always want the user to know about.

  2. Separation of data and log messages: that stray print/echo might end up in your data (e.g. myscript > outputfile) or might be discarded entirely if your program is running an environment where the output isn't recorded. Any real logging system will allow you to easily log into syslog() or files and do things like maintain separate logs so you could separate errors caused by invalid input and errors caused by bugs, misconfigured systems, etc.

  3. Convenience: filtering, formatting and context. Using a standard logging system means you can get out of the business of having to manually synchronize the formatting in all of your print statements. You also get things such as the exception traceback support which Python's logging module includes for free.

Shell: /usr/bin/logger

logger -t "my-shell-script" -p debug "reached this point in the script"
logger -s -t "my-shell-script" "Copying files…"

(The latter case demonstrates how logger with -s can be a search-and-replace alternative to echo)

Perl: using /usr/bin/logger

open( STDOUT, "|-", "logger -t MyScript" );
open( STDERR, ">&STDOUT" ); # Could also be "logger -t MyScript -p error"
select(STDERR); $| = 1; # Disable buffering for our input and output streams:
select(STDOUT); $| = 1;

Perl: Logger::Syslog module

use Logger::Syslog;
logger_prefix("MyScript")
notice("Something broke!");

Python: logging (standard module)

import logging
logging.debug("reach this point for user %s" % username)
…
logging.critical("Couldn't open config file %s - aborting!" % config_file)

Python: SimpleSyslog (sysadm module)

from SimpleSyslog import SimpleSyslog
log = SimpleSyslog()
log.warn("foobar")

Unit Tests

Because you're smart you're building functions / classes instead of duplicating code. You've probably run into the case where you want to change something but aren't sure what the implications of that change are. This is where the concept of a Unit Test started: by writing simple tests for each component, you can make your changes with impunity because you can easily run your test suite and discover any mistakes in the new code.

The Test-Driven Development school goes a step further and recommends writing the tests before the code since that gives you a solid reference for how you expect your code to behave and it's not unusual for the exercise to suggest design changes.

A unit testing framework simplifies this process by providing all of the test-related structure and handy primitives to use when writing your tests.

A few guidelines:

Python: unittest (standard module)

import unittest

def my_func():
    …

class TestMyFunction(unittest.TestCase):
    def testResultRange(self):
                i = my_func()
        self.assertTrue(i <= 5 and i >= 1)

if __name__ == '__main__':
    unittest.main()

Perl: Test::Unit (Debian libtest-unit-perl)

use Test::Unit;
…
sub test_foo { 
    my $i = my_func(); 
    assert(($i <=5  and $i >= 1), "my_func() failed to return a value between 1 and 5!"); 
}  

Common Tasks

Processing command-line arguments

Perl: Getopt::Long

use Getopt::Long;

Getopt::Long::Configure ("bundling"); # Allow single-dash options to be groups: -vvvv = -v -v -v -v
GetOptions(
    'v|verbose+'                => \$Verbosity,
    'max-days=i'                => \$MaxAgeInDays,
    'test'                      => \$TestMode,
);

Python: optparse

from optparse import OptionParser
parser = OptionParser()
parser.add_option("-v", "--verbose", action="count", dest="verbose", help="provide more detail about what this program is doing", default=0)
parser.add_option("-l", "--log", dest="logfile", metavar="LOGFILE", help="record information in a file instead of displaying it")
(options, args) = parser.parse_args()
if options.verbose > 1:
    print "This will be a chatty program"
if options.logfile:
    print "Storing diagnostic info in %s" % options.logfile

There's a good intro to the module in Doug Hellmann's Python Module of the Week: optparse

Manipulating the environment

Surprisingly many people do not know about the env(1) utility. It's handy for shell scripts as it allows you to add or delete environmental variables as part of the command-line executed in a different context by something like SSH or sudo.

Here's an example which clones a Unix system disk using rsync over SSH by providing the ssh-agent(1) authentication info for the privileged rsync command to use:

sudo env SSH_AUTH_SOCK=$SSH_AUTH_SOCK rsync -a --delete --progress -x --rsync-path="/usr/bin/sudo /usr/bin/rsync" / USERNAME@CLONE_HOST:/PATH_TO_CLONE_DISK/

Reading Structured Data

Yes, you can try to do this using regular expressions.

No, you probably shouldn't.

Comma-Separated Data

Python: csv (standard module)
import csv
for row in csv.reader(file('BadHardcodedFilename.csv')):
    print "Column 2 is %s" % row[1]
Perl: many (CPAN and http://rath.ca/Misc/Perl_CSV/)

XML

Python: ElementTree (standard module)

ElementTree Intro

Perl: XML::Simple (CPAN)
use XML::Simple;
my $file = XMLin('users.xml')
print $file->{'Users'}->{'cadams'}->{'email'};

Searching for files

Perl: File::Find (CPAN)

use File::Find;

foreach (@ARGV) {
        find( { wanted => \&find_files }, -l $_ ? readlink($_) : $_ );
}   
sub find_files {
        return unless -f;
        open(F, $_);
        print "$_ is a Python script\n" if <F> =~ m/python/;
}

Python: os.path.walk (standard module)

import os.path
def find_perl_scripts(arg, dirname, files):
    for f in files:
        if '#!/usr/bin/perl' in file(os.path.join(dirname, f)).read(15):
            print "%s is a Perl script" % f

os.path.walk(os.path.expanduser('~/bin'), find_perl_scripts, None)

Dealing with multiple files

Once you outgrow a simple tail -f * something like this comes in handy if you can't use something like SEC:

Perl: IO::Multiplex (CPAN)

use IO::Multiplex;

$mux = IO::Multiplex->new( );
$mux->add($FH1);
$mux->add($FH2); # ... and so on for all the filehandles to manage
$mux->set_callback_object(__PACKAGE__);  # or an object
$mux->Loop();

sub mux_input {
  my ($package, $mux, $fh, $input) = @_;
    if ($input =~ m/…/) …
}

(Courtesy of the Perl Cookbook)

Restarting a task when a system reboots:

cron has a handy @reboot time specifier which you can use to ensure that your code will be restarted when a machine reboots:

@reboot /path/to/my/script

Daemonization

Creating a proper Unix daemon requires a fair number of steps to do properly. It may be easy to use a program such as daemontools or daemonize which allows you to run a program as a daemon without modification.

If you need more control other people have done the hard parts:

Automating other programs

Expect & derivatives

expect(1) takes a script and uses that to run another program. This allows you to script processes which normally require manual intervention and because it supports branching you can handle tasks with conditional steps such as connecting to a remote system using SSH which might require you to enter a password or accept a host-key if you haven't connected before.

expect has inspired a number of libraries for most major languages and by now I would only recommend using those as the richer language offered by something like Perl is worth it if you need to do anything other than provide canned responses.

Perl: Expect (CPAN)
use Expect;

my $exp = Expect->spawn("ssh -t $Host $command") or die("Couldn't connect to $Host!");
$exp->expect(
    5,
    [
            '^Are you sure you want to continue connecting \(yes\/no\)\?',
            sub { my $self = shift; $self->send("yes\n"); exp_continue; }
    ],
    [ '^Password:', sub { my $self = shift; $self->send("$Password\n"); exp_continue; } ],
    …
);

Screen

GNU Screen is a tool you need to know. It's handy if you're using an unreliable connection, need to run a terminal application for a long period of time or want to run indepdent applications on multiple hosts:

screen quickstart on the SNL Info Wiki

Screen is also handy because it's scriptable and provides ways for you to create and manage individual windows within a session:

 screen -X screen -t'$Hostname' cmd