SmartLogic Logo (443) 451-3001

The SmartLogic Blog

SmartLogic is a web and mobile product development studio based in Baltimore. Contact us for help building your product or visit our website to learn more about what we do.

Subversion Timestamps + Capistrano finalize_update

June 7th, 2008 by

Update 2008/06/13: Jamis released Capistrano 2.4.0, and it includes the :normalize_asset_timestamps patch that I submitted!

Update 2008/06/11: Here’s a link back to the Google Groups Discussion regarding this topic.

Subversion has a lesser-known feature that allows you to specify that checkouts/exports/switches/reverts should timestamp files with the last committed timestamp. By default, this setting is turned off. As such, when you checkout a repository, every file is timestamped with the current time on your local machine.

To be honest, I’m not quite sure why this is the default. The pertinent section of the Subversion manual (you have to scroll to the bottom) describes the setting as such:

———-
use-commit-times

Normally your working copy files have timestamps that reflect the last time they were touched by any process, whether that be your own editor or by some svn subcommand. This is generally convenient for people developing software, because build systems often look at timestamps as a way of deciding which files need to be recompiled.

In other situations, however, it’s sometimes nice for the working copy files to have timestamps that reflect the last time they were changed in the repository. The svn export command always places these “last-commit timestamps” on trees that it produces. By setting this config variable to yes, the svn checkout, svn update, svn switch, and svn revert commands will also set last-commit timestamps on files that they touch.

———-

It’s not clear to me why the default aids in Makefiles. What is clear to me though is that Jamis Buck has taken this default behavior into account in Capistrano, the wonderful deployment tool we use at SLS.

The following code snippets will require a bit of understanding of the built-in Capistrano deployment recipes. Let’s take a look at the code for the :finalize_update task. This task is invoked after the code has been updated on the server (for Subversion, either by an export or update).

  task :finalize_update, :except => { :no_release => true } do
    # ... other details omitted

    stamp = Time.now.utc.strftime("%Y%m%d%H%M.%S")
    asset_paths = %w(images stylesheets javascripts).map { |p| "#{latest_release}/public/#{p}" }.join(" ")
    run "find #{asset_paths} -exec touch -t #{stamp} {} ';'; true", :env => { "TZ" => "UTC" }
  end

What this snippet of code does is compute a timestamp, and then touch each asset file on the server with that timestamp (-t #{stamp}). The intention for doing this was to handle the scenario where you have multiple asset servers. Since an export/checkout updates the timestamp with the local machine’s current time, it’s possible for the same asset to have different timestamps on separate asset servers.

So what’s the big deal? First, rails serves up images (when using the image_tag helper) with a querystring appended to it. This querystring is simply a timestamp. The reason for this is to support client-side caching (you’re doing this, right?). This basically allows you to set the “expires” attribute of that file several years (decades or millenniums, in fact) into the future. The reason this is so is because if that file ever changes, it’s last modified attribute will also change, effectively changing the querystring rails appends automatically, and causing your browser to download a ‘new asset.’ So, when the finalize_update task is invoked (which happens every time you re-deploy), all of these last-modified timestamps are reset, causing any repeat visitors to re-download these very same assets again.

I have submitted a patch to Jamis (which I hope he’ll apply soon!) that exposes an extra Capistrano parameter (:normalize_asset_timestamps), which would be set to true by default, leaving the original behavior in tact. The new :finalize_update task looks like:

  task :finalize_update, :except => { :no_release => true } do
    # ... other details omitted

    if fetch(:normalize_asset_timestamps, true)
      stamp = Time.now.utc.strftime("%Y%m%d%H%M.%S")
      asset_paths = %w(images stylesheets javascripts).map { |p| "#{latest_release}/public/#{p}" }.join(" ")
      run "find #{asset_paths} -exec touch -t #{stamp} {} ';'; true", :env => { "TZ" => "UTC" }
    end
  end

I’ll follow up when/if Jamis accepts the patch. Hopefully it can make it into version 2.4!

Deploying Rails Apps with Capistrano without root or sudo Privileges

June 6th, 2008 by

In an effort to prepare for my presentation on Rails Deployment with Capistrano and Phusion Passenger at the next Bmore on Rails ruby users meetup, I’m writing a series of blog posts to help illustrate some concepts. This represents the second installment of the series. Better setup for environments in rails addressed the set of structural changes that I make to any fresh Rails app. This post will focus on some general principles and useful security considerations to take into account when deploying Rails apps with Capistrano.

The primary point of this post is this: You don’t need to deploy using root. And you don’t need to grant sudo access to the user used for deployment.

Our primary deployment setup is either a single or two-box solution (web server, asset server, database server spread across two). We generally use MySQL for the backend, and Phusion Passenger to serve Rails. We deploy to either Ubuntu Server (Hardy 7.10) or CentOS 5. We also generally disallow root ssh access.

First of all, it’s important to categorize tasks into two types: privileged tasks and unprivileged tasks. The nice part about a rails app is that, for the most part, it’s pretty self-sufficient, and rarely ever needs to venture out of its own tree in the filesystem. This means that we can get away with deploying with an unprivileged account. There are, however, certain ‘setup’ tasks that likely need to be executed with root/privileged access. Fortunately, all of our privileged tasks can be performed before we ever deploy the app!

Privileged Tasks
For our baseline deployment, this includes the following:
1) Install any necessary software (Ruby, RubyGems, ImageMagick, MySQL, etc.)

2) Create the Rails app structure. For us, we create the following structure:

/var/vhosts/myapp
  /releases
  /shared
    /content

The default Capistrano setup task performs similar functionality. The only additional folder here is /shared/content. We use this folder to hold all of our uploaded assets (mostly via the File Column plugin). Then, on successive deployments, we set up symbolic links from the public directory up to this shared folder. This allows these assets to live outside of the context of a specific release.

3) Create a log directory at the system level: /var/log/myapp.

4) Create a symlink from the apache config directory (generally /etc/apache2/sites-enabled on Ubuntu, /etc/httpd/conf.d on CentOS) to /var/vhosts/myapp/shared/passenger.conf. Note that at this stage, passenger.conf does not exist. This is ok, as our cold deployment will address this, and each successive deployment will exploit this. This passenger.conf file will actually just be another symlink out to the current release. What this allows us is the ability to alter our apache/passenger configuration for this app on successive deployments. The apache config directories will not be visible to non-privileged users, and thus, we won’t be able to update those symlinks using a nonprivileged account.

5) Create (if it doesn’t already exist) a deploy user, and assign it to the same group that apache runs (www-data on Ubuntu, apache on CentOS).

$> adduser --group www-data deploy

6) chown the app root (/var/vhosts/myapp) and log directory (/var/log/myapp), and give both user/group write permissions (775, 774, or 770). Apache’s user will be able to write to these directories by virtue of them being owned by the group.

$> chown 775 deploy:www-data -R /var/vhosts/myapp /var/log/myapp

7) Create your database, and grant all privileges to a non-root user.

mysql> CREATE DATABASE myapp;
mysql> GRANT ALL PRIVILEGES ON myapp.* TO 'myappuser'@localhost IDENTIFIED BY 'asecretpassword';

8) Install Passenger and the GemInstaller gem (as long as we keep our geminstaller.yml file up to date, we can ensure that our server will use those exact gem versions).

In order to execute these tasks, you’re going to need root access. Note, however, that it is ill-advised to ever include your root password in your deploy script (lest someone accidentally commits it to the repo). One way to handle this is to implement some/all of this functionality in cap deploy:setup. You can then temporarily run this with the root user, but no password (this only works if root can ssh in), which will force you to enter your password. I like this approach, because at this point, we’ll never need the root password again. Put it in a lockbox and leave it alone! Your other option (if root can’t login), is just to login with the unprivileged account, su -, then perform these steps manually. Similarly, you can temporarily grant your deploy user unrestricted sudo access temporarily (but don’t forget to undo this!) in the sudoers file. Either way, these are one time steps, and it’s really not much of a hassle if you know what you’re doing.

Unprivileged Tasks
Everything else you’ll ever have to do (unless you’re adding a new feature that requires installing other software, etc.) with Capistrano can now be completed with the unprivileged deploy user that we created in step 5 above.

At this stage, the only difference between a cold deployment and a successive deployment is the fact that your app was never running in the first place. In essence, it’s the difference between a hard restart of apache and a soft restart. All other steps are the same:

1) create a new release (with the updated code)

2) update the current and filecolumn symlinks (these were the symlinks I mentioned above that point out to the shared/content directory)

3) ensure that all of our necessary gems are installed (reading geminstaller.yml and executing geminstaller if necessary) — more on this in the next post

4) run any pending migrations

5) update /var/vhosts/myapp/shared/passenger.conf to point to the config snippet in the latest release

6) restart apache (hard if cold deploy or if the apache config is updated, soft otherwise)

We overwrite more or less all of the default Capistrano recipes. Actual code will be released probably just after the presentation (Tuesday, Jun 10, 2008, 7:00 PM at Medical Decision Logic, 1216 E. Baltimore Street, Baltimore, MD 21202), as I’m still tweaking things here and there.

The important take home note is this: yeah, it’s nice to be able to just pull out your Capistrano recipes and build an app on a brand new server from scratch with a few command line calls. However, it is borderline impossible to securely pull this off. The line between server setup and application deployment becomes blurred when you try to put this together. The server setup process, by nature, requires root/privileged access. Incremental deployment, however, does not require this level of privilege. Capistrano was not designed to build a server from scratch. Rather, a better approach is to develop a server image that you will use for all of your client app servers.

My next post will elaborate (with more code samples, particularly recipe snippets and some capistrano/rails extensions I’ve been working on) on much of what was covered here. Additionally, I’ll go into further detail regarding other ways to make maintaining your production apps easier with Capistrano.

Better setup for environments in Rails

June 2nd, 2008 by

Update: I have created a gem (environmentalist) to create this configuration structure for you. Read Introducing environmentalist for an introduction.

I will be presenting at the next Baltimore Ruby Meetup (Tuesday, 6/10/08) on deploying applications with Capistrano and Phusion Passenger. In an effort to prepare (and perhaps induce a little bit of interest), I am writing a series of blog posts that help set the stage for the presentation.

In this first post, I discuss a common set of changes I make to the config structure of a fresh Rails app. This is pertinent because it has some (minor) effects on our deployment procedure, namely within my core capistrano recipes.

rails test -d mysql

One of the things that bothered me about the default config structure is the database.yml file. The file contains the database credentials for all of our environments. As you should know, the default file looks like:

# ...
# And be sure to use new-style password hashing:
#   http://dev.mysql.com/doc/refman/5.0/en/old-client.html
development:
  adapter: mysql
  encoding: utf8
  database: test_development
  username: root
  password:
  host: localhost

# Warning: The database defined as "test" will be erased and
# re-generated from your development database when you run "rake".
# Do not set this db to the same as development or production.
test:
  adapter: mysql
  encoding: utf8
  database: test_test
  username: root
  password:
  host: localhost

production:
  adapter: mysql
  encoding: utf8
  database: test_production
  username: root
  password: 
  host: localhost

We’ve got development, test and production credentials in this file. So…should this file be added to your repository? Well, localized settings files are generally not suitable for the repo…but what about that production block? That sure looks like it belongs in the repo considering there’s no reason it should differ amongst developers. Your production environment is, after all, the same as everyone else’s on your team. What if we wanted to add another environment (e.g. a staging environment)? That should probably go into the repo as well.

Some make the argument against putting the file into the repo. I’ve seen several capistrano scripts that echo out the contents of this local file onto the production server. This is fine and all (particularly if you set your recipe to read from the local copy), but then every developer needs to make sure that their local copy has the exact same credentials in that file. Another method I’ve seen is just copying those credentials directly into your cap recipe…..but that isn’t very DRY, and forces the developer(s) to remember that the attributes are repeated in multiple files.

Furthermore, how would we handle other pieces of configuration? For instance, with Passenger, each app stores locally an apache config :

<VirtualHost *:80>
  ServerName test.smartlogicsolutions.com
  RewriteEngine on
  RewriteCond %{SERVER_PORT} !^443$
  RewriteRule ^.*$ https://%{SERVER_NAME}%{REQUEST_URI} [L,R]
</VirtualHost>

<VirtualHost *:443>
  ServerName test.smartlogicsolutions.com
  DocumentRoot /var/vhosts/test/current/public

  PassengerRoot /usr/local/lib/ruby/gems/1.8/gems/passenger-2.0.0
  RailsRuby /usr/local/bin/ruby
  RailsEnv production

  SSLEngine on
  SSLCertificateFile /etc/apache2/ssl/apache.pem
  CustomLog /var/log/apache2/test.log combined
</VirtualHost>

This is not a problem if we only ever deploy to a single environment (production). But what about that staging environment? The naive solution is to pollute the config directory with several apache config files (one for each deployable environment). Prior to Passenger, we used mongrel_cluster, which causes basically the same problem when you need to keep distinct copies for separate environments. Several other plugins/gems require configuration as well that will not necessarily be the same for all of your deployable environments.

The default config directory of a rails app looks like:

config/
  boot.rb
  database.yml
  environment.rb
  environments/
    development.rb
    production.rb
    staging.rb
    test.rb
  initializers/
  routes.rb

One option would be to add more top-level folders (similar to environments/) for each of these pieces of configuration:

config/
  boot.rb
  environment.rb
  apaches/
    production.conf
    staging.conf
  databases/
    development.yml
    production.yml
    staging.yml
    test.yml
  environments/
    development.rb
    production.rb
    staging.rb
    test.rb    
  initializers/
  routes.rb

I don’t know why, but this just feels wrong to me. I like to rearrange my directory structure such that I have a directory for each of my environments:

config/
  boot.rb
  development/
    database.yml
    environment.rb
  environment.rb
  initializers/
  production/
    apache.conf
    database.yml
    environment.rb
  routes.rb
  staging/
    apache.conf
    database.yml
    environment.rb
  test/
    database.yml
    environment.rb

Now, each of my environments has its specific settings grouped together. It’s cleanly organized and obvious when looking at the contents of the config directory which environments exist and where their configurations are stored. There are two things in particular to note:


1) I now have 4 database.yml files. You might argue that this isn’t DRY, but in reality I have exactly the same number of lines of code written as I would have had I used a single database.yml file. e.g. config/production/database.yml looks like:

production:
  adapter: mysql
  database: myapp
  username: myapp_user
  password: supersecret
  socket:  /var/run/mysqld/mysqld.sock

Furthermore, this file can (and should) be checked into Subversion! And developers won’t have to mess with svn:ignores et al. (Note that config/test/database.yml and config/development/database.yml do not belong in the repo.)

2) I have changed the name and location of each of my environment-specific configuration files. e.g. config/environments/development.rb is now located at config/development/environment.rb.

So, now our config directory is a little better organized. Unfortunately, rails is going to complain about not being able to find config/database.yml and config/environments/development.rb. All we need to do is override where rails looks for them, and we’ll be good to go.

I create a new file config/postboot.rb:

# Be sure to restart your server when you modify this file.

rails_env = ENV['RAILS_ENV'] || 'development'

env_dir  = File.join(RAILS_ROOT, 'config', rails_env)
db_file  = File.join(env_dir, 'database.yml')
env_file = File.join(env_dir, 'environment.rb')

raise "#{env_dir} environment directory cannot be found." unless File.exists?(env_dir)
raise "#{db_file} is missing.  You cannot continue without this." unless File.exists?(db_file)
raise "#{env_file} environment file is missing." unless File.exists?(env_file)

# Now, let's open up Rails and tell it to find our environment files elsewhere.
module Rails
  class Configuration
    
    # Tell rails our database.yml file is elsewhere
    def database_configuration_file
      File.join(root_path, 'config', environment, 'database.yml')
    end
    
    # Tell rails our environment file is elsewhere
    def environment_path
      "#{root_path}/config/#{environment}/environment.rb"
    end
  end
end

Lastly, I need to hook it in right after boot.rb is run. As such, I add a require for the file just after boot.rb is required in config/environment.rb:

# Be sure to restart your web server when you modify this file.

# Uncomment below to force Rails into production mode when 
# you don't control web/app server and can't set it the proper way
# ENV['RAILS_ENV'] ||= 'production'

# Specifies gem version of Rails to use when vendor/rails is not present
RAILS_GEM_VERSION = '2.0.2' unless defined? RAILS_GEM_VERSION

# Bootstrap the Rails environment, frameworks, and default configuration
require File.join(File.dirname(__FILE__), 'boot')

# Pull in the postboot initializer
require File.join(File.dirname(__FILE__), 'postboot')

Rails::Initializer.run do |config|
...

Now, our rails app is ready to go! I have not yet created a script to automate all of this. As I compile resources for my presentation next week, I will include a script to do just this…..and obviously share it with you then!

  • You are currently browsing the archives for the Deployment category.