Chrooted SFTP on Ubuntu with emailed transfer logs

September 24, 2013 Leave a comment

SSH is great, and OpenSSH is really great. No other tool comes close to matching its versatility for all kinds of remote access tasks. It can control other machines, transfer files, display remote GUI apps, and even functions as a quick-and-dirty VPN. All this comes with strong encryption and authentication. So when my wife asked me to set up an Internet-facing file server that would enable her to share documents with her clients securely, it took me about a second to decide that SSH (or, more specifically, SFTP) was perfect for the job.

OpenSSH

The OpenSSH Project

Her needs were simple. Each client should have a private directory for sending and receiving files. My wife needs full access to all the directories. She’d also like to be notified about transfers via email. And since the files often contain sensitive financial data, they must be encrypted in transit. OpenSSH is capable of all of this and free clients are available for every OS. I figured I could have a working setup running in a couple of hours.

Ehh, not quite. Like many UNIX-flavored power tools, OpenSSH is extremely flexible. But harnessing that flexibility took more googling and scripting than I imagined. At one point, I even started to think that OpenSSH is less than really great. I considered throwing in the towel and moving to a nicely packaged FTP-over-SSL solution, only to be confronted with an annoying deluge of firewall gotchas and scarce client support for truly secure transfers. So I came back to SFTP determined to make it work and decided to document my setup in case it helps someone else. My starting point was an Ubunutu 12.04 LTS box with the openssh-server package installed and running. I found several resources that helped with various aspects of my configuration and came up with this consolidated recipe.

Set up directory and admin user

The first step is to create the root directory that will hold all of the SFTP users’ private directories. For this, I created a new logical volume and filesystem mounted as /sftp-root. This gives me a convenient unit of storage for backups and lets me limit the total amount of disk space consumed by all users. But if you don’t want the added complexity of a separate filesystem and you trust your users not to fill up your disk, you can just create a directory on the root filesystem like so:

sudo mkdir /sftp-root

Next, I create two new user groups:

sudo addgroup sftp
sudo addgroup sftp-only

The sftp group will own all of the files created by SFTP users as well as the administrative user who has access to all of the files. The sftp-only group identifies users who may access the server only for SFTP within their private directories. They aren’t allowed shell access.

Now, I create the administrative user, called sftpadmin. This user is a member of the sftp group and I set her home directory to /sftp-root.

sudo adduser --home /sftp-root --ingroup sftp sftpadmin

Since the /sftp-root directory is owned by root, Ubuntu gives a warning and doesn’t generate login profile files for this user, but that’s OK because she will probably only need to connect via SFTP. (I’m not explicitly ruling out shell access, though.)

Configuring SSH and PAM

Next, I set up SSH and PAM so that users of the sftp-only user group can access the server only via SFTP. Furthermore, the root of the filesystem visible to them is their home directory (called “chroot jail”). To accomplish all this, I edit /etc/sshd/sshd_config as root and change the “Subsystem sftp” line to read:

Subsystem sftp internal-sftp -l INFO

This tells sshd to use an in-process SFTP server with log level set to INFO. Next, add the following section to the end of the file:

# SFTP Jailed users
Match group sftp-only
        X11Forwarding no
        AllowTcpForwarding no
        ChrootDirectory /sftp-root/%u
        ForceCommand internal-sftp -l INFO

For users of the sftp-only group, this enforces the chroot restriction and prohibits any kind of shell access. Members of this group include the low-privileged SFTP users, but not the sftpadmin user. After making these changes, I save and close the file.

Now, I want the SFTP users to be able to modify or delete any files that sftpadmin creates in their private directories. Similarly, sftpadmin should be able to modify or delete files in all the private directories. To implement this, all files created by SFTP users (including sftpadmin) must be “owned” and writable by a common user group. I use the sftp group for this, which has as its members all of the low-privilege accounts plus sftpadmin. To make their files group-writable by default requires a umask. Normally, this can be set in a login script, but since the jailed users don’t have shell access, I can’t take that approach. Instead, I use a feature of PAM that allows me to specify a default umask for users connecting through SSH. I edit /etc/pam.d/sshd as root and add the following line near the end:

# UMask for chrooted SFTP users
session optional pam_umask.so umask=002

Now. commit the SSH and PAM changes by restarting the service:

sudo service ssh restart

Done with SSH! On to logging.

Configure basic logging

I’ll set up basic SFTP logging first. Logging from the in-process SFTP server is tricky because it requires access to a shared logging socket located outside of the user’s chroot jail. First, I create a directory on the /sftp-root filesystem that will contain the special socket “file” and tell the system logger (rsyslog) to create a socket there. This command creates the directory:

sudo mkdir /sftp-root/dev

Next, create or edit /etc/rsyslog.d/sshd.conf as root and add these lines:

# Create an additional socket for some of the sshd chrooted users.
$AddUnixListenSocket /sftp-root/dev/log

# Log internal-sftp in a separate file
:programname, isequal, "internal-sftp" -/var/log/sftp.log
& @127.0.0.1:39276
:programname, isequal, "internal-sftp" ~

This sets up the additional input socket (creatively called “log”) to which the SFTP server processes write. Log output is sent to a new file called sftp.log and echoed to a local UDP port (127.0.0.1:39276). The UDP port will help generate the transfer logs to be emailed, but you can omit the line if you’re not interested in that.

Not done with rsyslog quite yet. Since the jailed users don’t have access to the file system outside of their home directories, I need to make the /sftp-root/dev/log socket visible within each jail. This requires creating a hard link in each jail that points to the socket file. As if that weren’t complicated enough, the hard links need to be regenerated each time the rsyslog service starts because it creates a new socket, invalidating the old links. To automate this, I wrote a small Python script that iterates through all the private jail directories and refreshes the hard links. The script is called /usr/local/bin/refresh-chroot-log-links and it looks like this:

#!/usr/bin/env python

# Refreshes hardlinks from SFTP users' chroot jail directories to the
# shared rsyslog socket directory. Takes no params. Must be run as root.

import os
import glob

#----- Constants for directories, group names, and other file system stuff.

SFTP_ROOT_DIR = '/sftp-root'
DEV_DIR = 'dev'
LOG_NODE = 'log'

#----- Start of script

if (os.geteuid() != 0):
        print 'Must run as root.'
        exit(1)

# Find the private dev dirs for each chrooted user
shared_log_node = SFTP_ROOT_DIR + '/' + DEV_DIR + '/' + LOG_NODE
dev_dirs = glob.glob(SFTP_ROOT_DIR + '/*/' + DEV_DIR)
for jail_dev_dir in dev_dirs:
        jail_log_node = jail_dev_dir + '/' + LOG_NODE
        # Remove old log link and recreate.
        if os.path.exists(jail_log_node):
                os.remove(jail_log_node)
        os.link(shared_log_node, jail_log_node)

After creating the file, make it executable with:

sudo chmod +x /usr/local/bin/refresh-chroot-log-links

If you run the script now (as root), it won’t do anything because no SFTP user directories have been created yet. We’ll rectify that soon. But for now, I set the script to run each time the rsyslog daemon starts by adding these lines to the end of /etc/init/rsyslog.conf:

post-start script
  refresh-chroot-log-links
end script

Save the file as root and close. Almost done. I just need to set up log rotation for the new sftp.log file so it doesn’t grow without limits. I create a new file as root called /etc/logrotate.d/sftp with this text. Customize to your liking:

/var/log/sftp.log {
        monthly
        missingok
        rotate 12
        compress
        delaycompress
        postrotate
                invoke-rc.d rsyslog reload > /dev/null
        endscript
}

Finally, it’s time to apply all of the logging changes. Restart the rsyslog daemon with:

sudo service rsyslog restart

The next step is to create the SFTP users.

Create users

As you may have guessed by now, the SFTP users require some special configuration. I created another Python script that wraps the basic Ubuntu adduser functionality and automates the process of adding a new SFTP user. As root, create /usr/local/bin/addsftpuser and paste in these contents:

#!/usr/bin/env python

# Create a chroot jailed SFTP user.
# Usage (as root): addsftpuser <username>

import os
import sys
from subprocess import check_call

#----- Constants for directories, group names, and other file system stuff

SFTP_ROOT_DIR='/sftp-root'
SFTP_JAILED_GROUP='sftp-only'
SFTP_OWNER_GROUP='sftp'
DATA_DIR='data'
DEV_DIR='dev'

#----- Start of script

if len(sys.argv) != 2:
        print 'Usage:', sys.argv[0], ''
        exit(1)
if (os.geteuid() != 0):
        print 'Must run as root.'
        exit(1)

username = sys.argv[1]
jail_root = SFTP_ROOT_DIR + '/' + username
home_dir = jail_root + '/' + DATA_DIR
jail_dev_dir = jail_root + '/' + DEV_DIR

# Create jailed and home directories so skeleton files won't be copied into it.
os.makedirs(home_dir)
# Create dev dir for logging
os.mkdir(jail_dev_dir)
# Create user.
check_call(['adduser', '--home', home_dir, '--shell', '/bin/false', '--ingroup', SFTP_OWNER_GROUP, username])
# Add user to jailed group.
check_call(['adduser', username, SFTP_JAILED_GROUP])
# Home directory should be owned by user and sftp group.
check_call(['chown', '-R', username + ':' + SFTP_OWNER_GROUP, home_dir])
# Child files should inherit parent's group even if written to by a different user (like Amy).
check_call(['chmod', '-R', 'u+rwx,g+rwxs', home_dir])
# Reset home dir relative to jail root.
check_call(['usermod', '--home', DATA_DIR, username])
# Refresh hard links for logging under chroot jails
check_call(['refresh-chroot-log-links'])

The script takes one parameter (the username) and creates the user and directories and adds the user to the proper groups. Notice that the last line calls the refresh-chroot-log-links script that I created earlier to set up the hard link to the logging socket. After saving this, make it executable by running:

sudo chmod +x /usr/local/bin/addsftpuser

You then execute it like this:

sudo addsftpuser <username>

Where <username> is the name of the user you want to create. Running the command generates the following directory structure under /sftp-root:

/sftp-root
   +--<username> (user's root directory)
      +--data (directory containing the shared files)
      +--dev
         +--log (hard link to logging socket)

You may see a warning about skeleton files not being created. You can safely ignore it because the user isn’t permitted shell access anyway. The /sftp-root/<username> directory is owned and writable only by root, which is a prerequisite for OpenSSH’s ChrootDirectory setting. That’s why the user’s home directory is set to the /sftp-root/<username>/data subdirectory, which she does own. All files created in the data subdirectory inherit the sftp group and are group-writable, which gives both the SFTP user and sftpadmin the ability to read and write files created by the other.

At this point, you are ready to test out the configuration! Create an SFTP user and use your favorite SFTP client to connect from a different machine. You should also try logging in as the sftpadmin user and verify that you can see and modify the contents of all of the SFTP users’ directories.

Emailed transfer logs

The last feature I’ll write about is transfer logs. My wife would like to be notified through email about the activity that happens during each client session including uploads, downloads, and file deletions. Sadly, OpenSSH doesn’t provide an easy-to-read transfer log, much less send it out through email, so I devised my own solution. I wrote a program that runs as a service and watches the SFTP log messages. It keeps track of open SFTP sessions, which can be identified by a process ID (PID). Then at the end of each session, it summarizes the uploads, downloads, and deletions and sends out an email.

In order for this to work, you need an SMTP server to relay the emails. You can configure a mail transfer agent (MTA) such as Postfix on the same box to handle this or use a remote SMTP server. The Ubuntu wiki contains some documentation on getting started with Postfix. Once you have the relay server worked out, create a new file as root called /usr/local/bin/sftp-xfers-mailer with the following contents:

#!/usr/bin/env python

# Listens to log traffic from OpenSSH's "internal-sftp" subsystem on a local UDP port
# and generates one transfer log per session, which it then sends as an email.
# This program is designed to run as a service.

import sys
import signal
import re
import string
import smtplib
import time
import SocketServer
from email.mime.text import MIMEText

#----- Local environment settings. Customize as needed.

LISTEN_PORT=39276
FROM = 'SFTP Daemon <sftp-daemon@example.com>'
TO = 'All <all@example.com>'
SMTP_SERVER = 'localhost'

#----- Start of script

class Session:
        """A simple session structure."""

        def __init__(self):
                # Upload messages
                self.uploads = []
                # Download messages
                self.downloads = []
                # File delete messages
                self.deletes = []

class SftpMailerRequestHandler(SocketServer.DatagramRequestHandler):
        """Implements the UDP listener."""

        # Dictionary of open self.sessions indexed by process ID (pid)
        sessions = {}

        def handle(self):
                """Consumes line input from datagrams until interrupted."""

                try:
                        while True:
                                next_line = self.rfile.readline()
                                if not next_line:
                                        break
                                self.process_line(next_line)
                except (KeyboardInterrupt, SystemExit):
                        sys.exit(0)

        def process_line(self, line):
                """Validate that a given line of log data has the expected format and parse the message.

                We parse the line to pull out common headers if we can, then pass the message on for processing.
                """

                # The following regex matches a log line of the form:
                #  internal-sftp[]:
                # There are three capture groups that capture the headers, pid, and message, respectively.
                valid_line = re.search('(.*)\s+internal-sftp\[(\d+)\]:\s+(.*)', line)
                if not valid_line is None:
                        headers = valid_line.group(1)
                        pid = valid_line.group(2)
                        message = valid_line.group(3)
                        self.process_message(valid_line.group(1), valid_line.group(2), valid_line.group(3))

        def process_message(self, headers, pid, message):
                """Given a valid log message, identify its type and dispatch to an appropriate handler
                for further processing.

                The headers are not currently used and can contain anything. The pid parameter identifies
                the session and is used as a key to group related log messages. The message is the bit
                we're trying to parse.
                """

                # The following regex matches a log line of the form:
                # close  bytes read <read_bytes> written <write_bytes>
                # There are three capture groups for the filename, bytes read, and bytes written, respectively.
                file_xfer_msg = re.search('close (.*) bytes read (\d+) written (\d+)', message)
                if not file_xfer_msg is None:
                        self.process_file_xfer_msg(pid, file_xfer_msg.group(1), file_xfer_msg.group(2), file_xfer_msg.group(3))
                # The following regex matches a log line of the form:
                # remove name
                # There is one capture group for the filename.
                file_remove_msg = re.search('remove name (.*)', message)
                if not file_remove_msg is None:
                        self.process_file_remove_msg(pid, file_remove_msg.group(1))
                # The following regex matches a log line of the form:
                # session closed [...] user  from []
                # There are two capture groups for the username and host, respectively.
                session_close_msg = re.search('session closed.*user (.*) from \[(.*)\]', message)
                if not session_close_msg is None:
                        self.process_session_close_msg(pid, session_close_msg.group(1), session_close_msg.group(2))

        def process_file_remove_msg(self, pid, filename):
                """Records a file delete action in the session object.

                The PID identifies the session and the filename is the file that the user deleted from the server.
                """

                session = self.create_or_get_session(pid)
                session.deletes += [filename]

        def process_session_close_msg(self, pid, username, host):
                """At the end of a session, generate report text from the actions recorded in the session data
                structure and send it in an email.

                The PID identifies the session and the username is the SFTP user. Host is the remote host IP
                address. Once the session closed message has been processed, the session object associated
                with the PID is removed.
                """

                # Get session (if any) from the local session dictionary, indexed by pid.
                if pid in self.sessions:
                        session = self.sessions[pid]
                        uploads_summary = ''
                        if session.uploads:
                                uploads_summary = 'UPLOADS\n  ' + string.join(session.uploads, '\n  ') + '\n'
                        downloads_summary = ''
                        if session.downloads:
                                downloads_summary = 'DOWNLOADS\n  ' + string.join(session.downloads, '\n  ') + '\n'
                        deletes_summary = ''
                        if session.deletes:
                                deletes_summary = 'REMOVED\n  ' + string.join(session.deletes, '\n  ') + '\n'

                        if session.uploads or session.downloads or session.deletes:
                                self.send_xfer_log(pid, username, host, uploads_summary, downloads_summary, deletes_summary)
                        del self.sessions[pid]

        def send_xfer_log(self, pid, username, host, uploads_summary, downloads_summary, deletes_summary):
                """Email report for a single session's activity.

                The PID identifies the session. Username and host are the remote SFTP user's details.
                The summary parameters provide text summaries of each kind of action the user took
                during the session.
                """

                # Generate the message body and envelope.
                body = 'SFTP activity of ' + username + ' connecting from ' + host + ':\n\n'
                body += uploads_summary + downloads_summary + deletes_summary
                body += '\nSession ' + pid + ' finished on ' + time.strftime('%a, %d %b %Y %I:%M %p')
                msg = MIMEText(body)
                msg['Subject'] = 'SFTP session ' + pid + ' with ' + username
                msg['From'] = FROM
                msg['To'] = TO
                try:
                        # Send it!
                        smtp = smtplib.SMTP(SMTP_SERVER)
                        smtp.sendmail(FROM, TO, msg.as_string())
                        smtp.quit()
                except:
                        sys.stderr.write('Cannot send email: %s\n' % sys.exc_info()[1])

        def process_file_xfer_msg(self, pid, filename, read_bytes, write_bytes):
                """Record a file upload or download action in the session object.

                The PID identifies the session. The filename is the name of the file (big surprise).
                Only one of bytes read or written should be greater than 0, the other should be 0.
                The non-zero value indicates whether the file was uploaded or downloaded.
                """

                # Create or update session from the local session dictionary, indexed by pid.
                session = self.create_or_get_session(pid)
                if int(read_bytes) > 0:
                        session.downloads += [filename + ' (' + read_bytes + ' bytes)']
                elif int(write_bytes) > 0:
                        session.uploads += [filename + ' (' + write_bytes + ' bytes)']

        def create_or_get_session(self, pid):
                """"Return a session object for the given PID or create a new one."""

                session = None
                if pid in self.sessions:
                        session = self.sessions[pid]
                else:
                        session = Session()
                        self.sessions[pid] = session
                return session

if __name__ == "__main__":
    host, port = "localhost", LISTEN_PORT
    server = SocketServer.UDPServer((host, port), SftpMailerRequestHandler)
    server.serve_forever()

At the top, replace the environment-specific email settings with ones appropriate for your environment. The listener port is arbitrary, but must match the one in /etc/rsyslog.d/sshd.conf that you specified earlier. The email settings speak for themselves. However, if your SMTP server requires authentication or encryption, you will need to tweak the code in the send_xfer_log method. When you’re done customizing the program for your enviuronment, make it executable with:

sudo chmod +x /usr/local/bin/sftp-xfers-mailer

You should test it now to make sure it works. Just run it from the command line:

sftp-xfers-mailer

Try logging in as one of the SFTP users, upload a file, and then log out. If your SMTP settings are correct, you should get an email. Once that’s verified, press CTRL-C to exit. The last step is to register the sftp-xfers-mailer program as a service and have it start and stop in tandem with rsyslog. To accomplish this, create a new file as root called /etc/init/sftp-xfers-mailer.conf with this text:

# sftp-xfers-mailer
#
# Listens for SFTP log activity and sends emails for each session.

description     "SFTP xfers mailer"

start on started rsyslog
stop on stopping rsyslog

respawn
respawn limit 10 5
umask 022

exec /usr/local/bin/sftp-xfers-mailer

Finally, start the service with:

sudo service sftp-xfers-mailer start

You should see a reponse indicating that the service is running.

Conclusion

This was a lot of work, but I’m happy with the result. Each user gets a private directory for sharing files and my wife (the sftpadmin user) has access to them all. Whenever each user connects and performs some activity, it is logged and a summary sent through email. Adding new SFTP users is easy with the addsftpuser script.

I learned a lot in this process, but there’s always room for improvement. Share your thoughts and suggestions in the comments!

Advertisements
Categories: FLOSS

Linux LVM for Home PCs

I’ve been using GNU/Linux as my main OS at home and work for close to six years now and I’m still regularly surprised by all the cool stuff it can do. One of the distinctive characteristics of Linux’s decentralized developer community is that great features can be easy to overlook, getting better over time with little fanfare. Discovering these quiet gems is a treat and serves as a reminder of just how much awesome free software is out there.

IBM 2415 tape drives. My old server's storage was not too far removed from this. Photo by Dick107, Wikimedia Commons.

A case in point that recently caught my attention is Linux’s Logical Volume Manager (LVM). Now, I’ve been aware of LVM’s existence as an alternative to “plain old” hard disk partitions for some time, but not being a sysadmin, I’d always assumed it was only relevant for large servers with beefy RAIDs or SANs standing behind them. I didn’t pay it much attention. That changed earlier this summer when I decided it was time to replace my creaky home file server with something built-in the 21st century. I went looking for an inexpensive box that could handle all of the following:

  • Store and playback my video and music libraries through home theater gear or over the LAN
  • Backup other PCs over the network
  • Off-site backups and storage trading
  • General file sharing for my wife and me

I chose System76’s Meerat Ion Nettop upgraded to a 2 TB hard disk. I installed Kubuntu 11.04 (desktop, not server), along with XBMC for media, BackupPC for automated LAN backups, and Wuala for off-site backups. Within a weekend, I was in business with a big, beautiful KDE Plasma desktop lighting up my living room.

There were clouds on the horizon, however. It turns out 2 TB of storage is not quite the endless sea it seems once you factor in the demands of a respectable video/music collection, consolidated backups of other PCs on the LAN, and traded local storage space to keep off-site backups economical. To make matters worse, although I had the foresight to create a separate partition for all this data so I could easily reinstall the OS at will, I ended up making the OS partition much larger than it needed to be, resulting in a lot of wasted space. And finally, since I wanted an up-to-date desktop and applications, I opted for the latest release of Kubuntu over the more conservative LTS version, but I worried that major upgrades to the distro could jeopardize the stability of my server.

Enter the Logical Volume Manager. LVM has a lot of interesting features, but the most important ones for my home server use case are pretty clear and compelling.

Resizable “Partitions”

LVM works by enabling you to create a pool of storage called a Volume Group that can span conventional partitions and hard disks, and then dynamically allocate and reallocate portions of that pool to any number of Logical Volumes. Logical Volumes are similar to partitions in that they contain a file system and can be mounted in the OS, but they are abstracted from the hardware in such a way that they can be moved and resized at will. This solves my problem of having allocated too much space to the OS quite nicely. Using logical volumes instead of partitions, I can easily shift free space from the OS to the data file system or start them both out small and grow them as needed.

There are tools out there like GNU Parted, which solve this problem without LVM. But tragedies too painful to recount from my Windows days have left me with a general aversion to anything that messes with the hard drive’s partition table. Even the makers of GParted Live, a bootable GUI for Parted, acknowledge that resizing partitions is a risky business, as evidenced by the scary warning label on their home page. I feel more comfortable with support baked into the OS.

Bottomless Storage

Another benefit of logical volumes is that they can grow beyond the capacity of a physical device. If I need to grow my data volume by two more terabytes, I need only buy another hard drive, add it to the volume group, and resize my volume to overflow into it. This feature is similar to RAID, but it is more flexible because hard disks of differing capacities can be added to a volume group. Conventional RAID requires all disks to be the same size unless you’re willing to sacrifice capacity on the larger drives. It is possible to run LVM on top of RAID for more resiliency when a drive fails, but I don’t require constant uptime from my home server, so expanding capacity over a chain of non-redundant hard disks suits me just fine.

Snapshots

Shrinking and stretching volumes is great, but what really sold me on LVM was its support for snapshots. A snapshot, as the name implies, is an instant backup of a logical volume at a point in time, which you can quickly restore or discard with no fuss. The LVM HOWTO touts it as an effective way to freeze the file system on an active server so that it can be backed up through conventional means without disrupting service. For my home server, though, I see it as a way to try out a major OS upgrade or configuration change without risk. If something goes south, turning back the clock to a time when things were running smoothly is simple and quick. The amount of extra disk space required depends solely on how much change occurs after the snapshot is taken.

I’ve been using similar functionality on virtual machines for years and wished for something similar I could use on the host OS. I looked with silent envy at similar features in OS X,  Windows, and Solaris, not realizing that Linux had an answer the whole time. Gone are the days of rummaging for a spare hard drive, booting up a CD to run partimage or CloneZilla, and kissing the next several hours goodbye as it meticulously cloned every byte of my main hard drive.

Conclusion

Now that my eyes have been opened to the things LVM can do for a small system, I see uses for it everywhere. At work, I’m going to try using snapshots to revert changes to my development database instead of rebuilding it from scratch (a lengthy process I’m forced to repeat several times a day). Similarly, I’ve avoided upgrading certain packages on my work PC because of a hodgepodge of openSUSE repositories I’ve added over the last year. Now I can forge ahead without fear of losing a day to video driver idiosyncrasies. On my home laptop, I can test-drive a new release of KDE before committing to it.

LVM is a good example of Linux’s many understated features and it’s not just for the enterprise. You probably won’t see it mentioned in your favorite distro’s next release announcement, but it’s worth getting to know it if you spend a lot of time working hard or hardly working on a Linux system.

The architect’s job after 1.0

I’m feeling energized and optimistic today, not only because it’s Friday, but because I just finished listening to Joe Wirtley’s excellent talk on the role of the architect in software development shops. Having acquired this title recently at my own company, I’ve been thinking about this a lot. It’s one question I get asked often, sometimes sheepishly, by job candidates during interviews for open developer spots. “So, umm, what do you do exactly?” A fair question, particularly because when I came to this company, the first version of the core product was near release, so the foundation was already hardening and many of the technical decisions about how it all hangs together had already been made.

Fortunately, though, the architect’s job doesn’t end when a product goes to release or a web site opens its virtual doors. Sure, as software matures, there are fewer conversations about what modules need to be built out and how they’ll interact and whether Spring or EJB is the right choice to handle cross-cutting concerns like database transactions. But the influences that drive decisions like those don’t disappear after 1.0. So what is my job? I could say that it is to help evolve the system in a cohesive way to meet new business challenges and performance, usability, and security objectives. (Cough.) But honestly, that’s not why I get up in the morning. My real purpose as I see it is to make developers’ lives easier—including my own. If I’m successful at that, the rest takes care of itself.

PitfallNow, by easier, I’m not talking about giving free license to cut corners on design, skimp on documentation, or get lazy in implementation. Just the opposite. It’s about making life easier over the long haul. Keeping an eye out for pitfalls as new features take shape and suggesting paths around them is part of it. Adopting tools that make it easier to push functionality forward is another. But above all, encouraging developers to avoid incurring new technical debt and pay down the old is the best way to ensure vitality of a project and gives everyone a sense of pride in what we’ve created. I don’t worry that it will take Ameer three extra days to make Feature A fit cleanly within our existing framework if it means that Sean will have an easier time adding Features A+1 and A+2 six months down the road after Ameer has moved on. In short, my job is to make sure that the system remains understandable as requirements change and programmers come and go.

When developers lives get easier, the organization and customers are the ultimate beneficiaries. Taking a long-term view of how the code evolves leads to better predictions about the time and effort needed to add new functionality. Paying down technical debt enables developers to spend less time fighting an unruly code base and more time solving a customer’s real problems. It also helps avoid new bugs caused by unintended side-effects of fixing the old ones. Choosing the best tools and investing in change if the project outgrows them reduces friction when five new pages need to be built out in five days. Cutting down the time it takes to run a full build means the difference between a developer who’s able stay focused on task and one whose eyes wander over to Facebook while her hard drive chugs away.

On the face, all this seems obvious. But the value of these investments often gets short shrift in an agilist culture when the benefit to the customer is not immediate or obvious. It’s up to the architect to fight the good fight even in the face of a management team feeling constant pressure to achieve short-term revenue and rapid-fire releases. Sometimes you make your point and sometimes you just need to get over it and ship the damn thing so everyone can get paid. But as the architect, you don’t stop championing the long-term view because after 1.0, that’s your job.

Categories: Programming

Static initializer block

Holy cow, you found your way to my brand new blog! Welcome. First, a little bit about myself: I’m a developer working mainly on Java EE stuff at a health care software company. After hours, I’m a free and open source software enthusiast and contributor. Linux is my OS of choice at home and work. (Credit turnover in IT for letting that one fly.) I consume tech news in unhealthy doses, and I’m hoping this blog will give me a better outlet for spewing commentary than muttering to myself on Twitter. I’m particularly interested in security and privacy to the extent they still exist. I like cats and the number 6.

My plan is to offer tips and opinions on the topics that interest me and hopefully some of you. You can expect to see posts on programming; the Java, Linux, and open source ecosystems; current tech news; and security. As I get more experienced at this, maybe I’ll gain more focus, maybe not. But I’ll try my best to offer something interesting, entertaining, and occasionally thought-provoking without letting my ignorance shine through too much. If I’ve done my job, maybe you’ll come back again some day! And that would be great.

Categories: Uncategorized