Ruby is a pure object-oriented language with single inheritance, duck-typing, mixins, code-blocks as first-class objects, and classes that are open by default. This combination of attributes gives the programmer the ability to write concise code that stays close to the problem domain: the language doesn’t get in the way. I’ve had a chance to read a lot of ruby code recently, much of it written by people new to ruby, and I’ve observed a number of ways in which the code could be improved by using the ruby language in a more idiomatic way.

Using the methods from Enumerable

Enumerable is a module in the ruby standard library. It is used as a mixin to provide additional methods to any class that implements the each() method. These additional methods allow us to express common iteration patterns in a concise way.

Creating one array from another, using map()

A common pattern I’ve seen is the following:

def convert(a)
    b = Array.new
    a.each {|x| b << some_function_on(x) }
    return b
end

This works, of course, but can be expressed more elegantly as:

def convert(a)
    a.map {|x| some_function_on(x) }
end

The map() method creates a new instance of its receiver class and populates each element of the new instance with the value returned by the block parameter. And with this approach the method wrapper (convert() in this case) can generally be done away with: you might as well just call map directly. This is true of most of the Enumerable methods: they reduce the need for auxiliary methods by allowing you to express common operations in a short and elegant way.

Operating on a subset of an array, using select()

Another common operation is to create one array from another, where not all elements of the original array qualify for inclusion in the target array.

def get_stuff(a)
    b = Array.new
    a.each {|x|
        if some_predicate(x)
           b << some_function_on(x)
               end
    }
    return b
end

This can be rewritten as:

def get_stuff(a)
    a.select {|x| some_predicate(x) }.map {|x| some_function_on(x) }
end

Operating on parallel arrays

Sometimes we have parallel arrays and we want to combine them element-wise. Here’s one way of doing it:

def combine(a, b)
    i = 0
    c = []
    a.each {|x|
        c << some_combiner(x, b[i]) # where x = a[i] of course
        i += 1
    }
    return c
end

We can make a slight improvement:

def combine(a, b)
    c = []
    a.each_with_index {|x, i| c << some_combiner(x, b[i]) }
    return c
end

This could be made even more compact using map_with_index - but that method does not exist, yet.

Finally, we can use the zip method:

def combine(a, b)
    a.zip(b).map {|x| some_combiner(x[0], x[1]) }
end

Accumulating the values of an array, using inject()

An obvious but uncommon example of this is :

def sum(a)
    n = 0
    a.each {|x| n += x }
    return n
end

This can be rewritten as:

def sum(a)
    a.inject(0) {|t, x| t + x }
end

Similarly:

def product(a)
    a.inject(1) {|t, x| t * x }
end

But here’s a more interesting case. Suppose we want to map an array to a hash. For example, we have an array of structs where each struct consists of (name, age) and we want to provide a hash from age -> [list of names]. We can accomplish this as :

a.inject( Hash.new {|h,k| h[k] = [] }) {|h,p| h[p.age] << p.name; h }

This relies on new hash entries being initialized with a new array instead of with nil - that is accomplished in the constructor for the Hash. Then the code block appends the person’s name to the list of names all of the same age, and returns the hash as the block’s value. Here is how this would be written in a less idiomatic way:

h = Hash.new
a.each {|p|
    h[p.age] ||= []
    h[p.age] << p.name
}
return h

Using *args

Methods can be defined as accepting a variable number of arguments by applying the unary ‘*’ operator to the final parameter. For example:

def f(fixed, *args)

Then, in the body of the method, args is an Array containing the zero or more variable arguments.

def do_stuff(*args, &block)
    args.each(&block)
end
do_stuff(1,2,3) {|n| puts n * 2}

Besides the usual reasons for wanting variable argument lists, there are at least two uses for unary splat: ensuring that we have an array, and delegating to another method when we don’t know what that method’s arity is.

Consider the following :

def f(x, a)
    a = [a] if !a.is_a?(Array)
    # do stuff with the array
end

For most use-cases we could say:

def f(x, *a)
    a = [*a].flatten
end

The latter approach lets us call f in any of the following ways:

f(1)     >> a = []
f(1,2)  >> a = [2]
f(1,2,3)   >> a = [2,3]
f(1,[2,3]) >> a = [2,3]

For an example of using *args for delegating to a method whose arity we don’t know, see method_missing in tbe next section.

Extending the Standard Library

Classes and modules in ruby are open by default, in the sense that we can add or modify their methods. This allows us to add functionality to existing classes, potentially leading to shorter and more readable code.

Let’s start by adding a method to the Enumerable module.

module Enumerable
    def map_with_index
        ret = self.class.new
        each_with_index {|x, i| ret << yield(x, i) }
            ret
    end
end

Now we can rewrite our earlier parallel array code as:

def combine(a, b)
    a.map_with_index {|x, i| some_combiner(x, b[i]) }
end

If you find yourself writing a helper function that operates on a standard ruby class, consider adding the method to the class instead. For example, instead of:

def even?(n)
    n.modulo(2) == 0
end

consider

class Numeric
    def even?
        self.modulo(2) == 0
    end
end

Then you can use expressions such as 3.even?, n.even?, etc.

As a larger, and more dangerous, example, we can extend the Hash class so that hash keys are treated as accessors.

class Hash
  def method_missing(sym, *args)
    s = sym.to_s
    if s =~ /=$/
      self[s.chop] = *args
    elsif s =~ /^\w+$/
      self[s]
    else
      super
    end
  end
end

This allows us to do things such as:

h = {}
h.foo = 'bar'
h.woop = 'te do'
puts h.foo

Using different quoting mechanisms

Ruby provides a number of ways of expressing literal strings and regular expressions. Appropriate use of the right quoting convention can lead to more readable code and less typing.

A literal string can be enclosed in double or single quotes, where double quotes allows variable interpolation, as in

t = "foo"
s = "Stuff and #{t} and more stuff"

In this case #{t} is expanded to have the value of t; and of course any expression can be included within the curly braces.

Occasionally you want to have quotes embedded within the quoted string. You can always backslash-escape them, but another way is to use the %q or %Q mechanism:

s = %Q<This is a "string" with quotes inside and interpolation : ${foo} as you can see.>

Regular expressions are normally written as

p = /stuff/

But often we want slashes inside the regular expression, as in

p = /stuff\/ and more\/ stuff/

The backslashes are hard to read, especially in a complex expression. We can rewrite this as:

p = %r<stuff/ and more/ stuff>

Finally, sometimes we want multi-line strings. We can accomplish this easily with:

s = <<EOD
This is a
multiline
string.
EOD

Using hash arguments

The final parameter of a method can be treated as an option hash. This provides a convenient way of allowing the caller to override default options.

def do_stuff(fixed, opts = {})
    options = @default_options.merge(opts)
    # do stuff
end

Metaprogramming

Ruby provides ways of dynamically re-writing code, thus allowing us to inject behavior into existing methods. A good example of this are the attr_reader and attr_accessor methods : these inject getters and setters into a class, and thus make it much easier to express those common operations.

Here I show three examples of this type of metaprogramming. The memoize method takes a list of method names (as symbols), and ensures that each of those methods will be called only once for a given set of arguments. The return value is stored in a hash table and looked up on subsequent calls. The eval_once method is similar except that it only works on methods that do not take any parameters, so instead of a hash table it uses a simple instance variable. Finally, log_method wraps specified methods with logging, reporting on the arguments passed to a method and the return value.

class Module
 private
  def memoize(*ids)
    ids.each {|id|
      module_eval <<-"EOD"
        alias_method :__memoized_#{id}__, :#{id.to_s}
    private :__memoized_#{id}__
        def #{id.to_s}(*args)
          @__memo_of_#{id}__ ||= {}
          @__memo_of_#{id}__[args] ||= __memoized_#{id}__(*args)
        end
EOD
    }
  end

  def eval_once(*ids)
    ids.each {|id|
      module_eval <<-"EOD"
        alias_method :__evalonce_#{id}__, :#{id}
        private :__evalonce_#{id}__
        def #{id}(*args)
          raise "eval_once can only be used on zero-parameter methods" if method(:__evalonce_#{id}__).arity > 0
          @__eval_of_#{id}__ ||=  __evalonce_#{id}__()
        end
EOD
    }
  end

  def log_method(*ids)
    ids.each {|id|
      module_eval <<-"EOD"
        alias_method :__logged_#{id}__, :#{id.to_s}
        private :__logged_#{id}__
        def #{id.to_s}(*args)
          ret = __logged_#{id}__(*args)
          $logger.debug "#{id.to_s} (" +
              args.map {|x| x.inspect}.join(',') +
              ") returns " +
              ret.inspect
          ret
        end
EOD
    }
  end
end
class Test
  def foo(n)
    puts "Really calling foo"
    return n * 2
  end
  def bar(n,m)
    puts "Really calling bar"
    return n * m
  end
  def ev1()
    puts "Really calling ev1"
    fib(20)
  end
  def ev2(x)
    puts "Should never get here"
    x
  end
  def fib(n) # normally the slowest possible way of implementing this, but memoize helps a lot
    n < 2 ? 1 : fib(n - 1) + fib(n - 2)
  end
  def baz(x, *args)
    [x.to_s.upcase, args.map {|x| x.to_s.upcase}].flatten
  end
  memoize :foo, :bar, :fib
  eval_once :ev1, :ev2
  log_method :baz
end

if __FILE__ == $0

    t = Test.new
    2.times {|i| 3.times {|j| puts t.foo(j) }}
    
    2.times {|i| (2..4).each {|j| (5..7).each {|k| puts t.bar(j,k) } } }
    puts t.fib(80)
    3.times {|i| puts t.ev1 }
    
    require 'logger'
    $logger = Logger.new(STDOUT)

    t.baz('baz argument')
    t.baz('stuff', 'and more stuff')

    # This will generate a runtime exception because eval_once can only be used on zero-parameter methods
    t.ev2(0)
end

Lambda and Proc

Example:

class Timer
  attr_accessor :error_handler
  def time(err_proc = nil)
    t, ret = Time.now, nil
    begin
      yield
    rescue Exception => ex
      err_proc ? err_proc.call(ex) : @error_handler ? @error_handler.call(ex) : nil
    ensure
      ret = Time.now - t
    end
    ret
  end
end
if __FILE__ == $0
  def fib(n)
    n < 2 ? 1 : fib(n - 1) + fib(n - 2)
  end
  was_error = nil
  t = Timer.new
  t.error_handler = lambda {|e| was_error = e }
  puts t.time {
    fib(25)
    raise "Foo"
  }
  puts was_error.inspect
  puts t.time(lambda {|e| puts e.to_s }) {
    fib(25)
    raise "Foo"
  }
end

Using blocks and yield for more than iteration

Blocks are good for more than just iteration. For example, if you want to write to a file you could do the following:

f = File.open('somefile', 'w')
begin
  f.puts("some text")
ensure
  f.close
end

An easier way :

File.open('somefile', 'w') { |f| f.puts("some text") }

If you pass a block to the IO#open method, the block will be called with the io handle, and will be guaranteed to be closed after the block returns. You can implement this same approach in any class that returns a resource that must later be disposed. For example, let’s extend the OCI8 class to automatically close cursors:

require 'oci8'
class OCI8
  def cursor_exec(*args)
    cursor = exec(*args)
    return cursor if !block_given?
    begin
      yield cursor
    ensure
      cursor.close
      puts "Closed the cursor"  # just for illustration
    end
  end
end

if __FILE__ == $0
  oc = OCI8.new("username", "pw", "dbname")
  oc.cursor_exec("select * from SOME_TABLE_NAME") {|cursor|
    while a = cursor.fetch
      raise "some exception" if cursor.row_count == 3
      print "#{cursor.row_count} "
      puts a.map {|x| x.to_s}.join(",")
    end
  }
end

If you change the user, pw, dbname, and table name appropriately, you can run this and see that although an exception occurs during processing of the retrieved rows, the cursor is closed. Naturally, as shown earlier, you could simply alias the original exec method and add this functionality without exposing the new cursor_exec() method.

.irbrc is your friend

irb is a great tool for exploration. You can make it even more useful by creating a .irbrc file in your home directory. irb executes the code in .irbrc when it starts up. It is just ruby code, executed as if you had typed it into irb, so you can do nearly anything you want.

The first thing you will probably want to do is require the set of modules that you are likely to use in irb. For example, here are some of the modules that I require:

%w(pp fileutils open-uri net/telnet observer timeout).each {|f| require f }

When you are using irb you frequently evaluate some lengthy expression to ensure that it does what you want. But, if you’re like me, you might forget to store the result in a variable where you can then use that value. Here’s a method you can add to .irbrc to let you retrieve that value:

def lv  # last value
    conf.last_value
end

Often you use irb to test code that you are currently working on. Here’s a method that will automatically require all of the ruby code in the current working directory:

def require_local
    Dir.glob('*.rb') {|fname| require fname}
end

Unit Testing

There are a number of unit testing packages available for ruby. I’m kind of old-school, so the one I use is test/unit. And, at least for initial development, I embed the unit tests in the same file as the classes being tested. The usual idiom for doing this is to put the following at the end of your source file:

if __FILE__ == $0
# your code here
end

The if statement evaluates to true only if the current file is the top-level file being run by ruby.

To use test/unit you do the following:

require 'test/unit'

class MyTesterClass < Test::Unit::TestCase
  def setup
      # code that should run before each test
  end
  def teardown
      # code to undo what you did in setup
  end
  def test_some_case
      your unit test code
  end
  def test_another_case
      your unit_test_code
  end
end

Within a test method there are a raft of assertions available - see the documentation at ruby doc.