Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. -Doug McIlroy, Inventor of Unix pipe
This post illustrates some of the ideas behind the Unix philosophy by composing utilities together to convert Markdown to HTML. I chose Ruby because it's a language I'm familiar with, but also to show that shell scripting does not require bash. The main idea is to write small filter scripts that can be composed together with pipes to create flexible tools.
Typically, Ruby files end with a .rb
extension and are executed by invoking
the Ruby interpreter explicitly. But instead typing out ruby markdown.rb
each
time, we can use a shebang at
the top of our script to indicate which language and interpreter to use.
#/usr/bin/env ruby
# This first line allows us to name our script `markdown` without the .rb extension
To skip invoking ruby
explicitly, use a terminal to change the permission of
our script to make it executable:
$ chmod +x markdown
$ ./markdown # no output and no errors
For the first iteration, our script will take a single argument -- a filename of
a markdown document to convert to html. Command line arguments are available
through an Array named ARGV
.
#/usr/bin/env ruby
markdown_file = ARGV.first # ARGV is ['foo.md']
puts markdown_file
If we run our script, we see that ARGV is whatever arguments we pass in:
$ ./markdown foo.md
foo.md
Other languages sometimes provide a convenience method named ARGC
to indicate
the number of command line arguments ((ARG)ument (C)ount) , but since ARGV
is
an array, it's easy to grab the number of arguments by calling ARGV.length
.
For a simple script like ours, it's sufficient to work with ARGV directly. If
there are complex options and flags to parse, the standard library's optparse
module is a good starting point. This
gist is a template and an example
for parsing conventional POSIX style flags.
We have a markdown filename. The next step is to convert it to HTML. There are many gems that will accomplish this, but for the sake of the exercise, we don't install any gems. Instead, we can use GitHub's markdown API with curl. For example:
$ echo '{"text": "# header"}' | curl -s -H 'Content-Type: text/plain' https://api.github.com/markdown -d'@-'
The first half of the pipeline uses echo
to print a JSON snippet to stdout. We
pipe this output to curl
. The last argument -d'@-'
is how we send the stdin
data to GitHub. From curl
's man page:
-d, --data If you start the data with the letter @, the rest should be a file name to read the data from, or - if you want curl to read the data from stdin.
-s
makes curl
's output less noisy, and -H
lets us pass in custom HTTP
headers required by the API.
Our script's immediate goal is to produce JSON similar to the echo
command in
our example above. Again, we do not need an external library because json
is
available through the standard library:
#/usr/bin/env ruby
require "json"
markdown_file = ARGV.first
payload = { :text => File.read(markdown_file) }.to_json
puts payload
We can test this works as expected by executing:
$ echo "# header" > foo.md
$ ./markdown foo.md | curl -s -H 'Content-Type: text/plain' https://api.github.com/markdown -d'@-'
<h1>header</h1>
With our prepared JSON payload, our next step is to handle the pipe within our
script. Ruby's IO.pipe
is a general way of creating a connected read and write
pipes, but for our purposes, we're interested in the higher abstraction of the
open3
module:
#/usr/bin/env ruby
require "json"
require "open3"
markdown_file = ARGV.first
payload = { :text => File.read(markdown_file) }.to_json
curl_cmd = %Q(curl -s -H 'Content-Type: text/plain' https://api.github.com/markdown -d'@-')
# `stdin`, `stdout`, `stderr` are the respective file handlers for the curl command
stdin, stdout, stderr = Open3.popen3(curl_cmd)
# Take the json we built and give it to curl via it's stdin. We have to
# explicitly close it's stdin before we can access it's stdout. Otherwise, curl
# will keep waiting for more input.
stdin.puts payload
stdin.close
# Print out curl's output to stdout and stderr
puts stdout.read
$stderr.puts stderr.read
Let's try it out:
$ echo "# header" > foo.md
$ ./markdown foo.md
<h1>header</h1>
In our last example, we had to create a markdown file because our script expects a filename as an argument. The great thing about most Unix utilities is they default to operating on stdin and stdout. This makes it easy to chain commands together so that the stdout of one command flows into the stdin of the next. For example, both of the following commands will print the number of lines in a file:
# count the number of lines for a given filename
$ wc -l README.md
# print out the file, then count the number of the lines on stdin
$ cat README.md | wc -l
We can improve our markdown script with this behavior; Convert a file if a file name is given, or convert whatever is passed in via stdin if there are no arguments.
#/usr/bin/env ruby
require "json"
markdown_file = ARGV.first
markdown_content = markdown_file ? File.read(markdown_file) : $stdin.read
payload = { :text => markdown_content }.to_json
# snip
We can simplify this code further by using Ruby's ARGF
object. From the
docs:
ARGF is a stream designed for use in scripts that process files given as command-line arguments or passed in via STDIN.
The same code rewritten with ARGF
would look like:
#/usr/bin/env ruby
require "json"
markdown_content = ARGF.read
payload = { :text => markdown_content }.to_json
# snip
As an added bonus, ARGF
works on multiple filenames, allowing our script to
process multiple files at once:
# ARGF.read will read in all 3 files
$ ./markdown foo.md bar.md baz.md
The complement to processing stdin by default is to keep stdout on topic. Stdout
should be reserved only for the intent of the original script. In our script,
the only output should be HTML because the script is used to convert markdown to
HTML. We avoid logging diagnostic messages or printing progress counters unless
we decide to add flags to our script that support these features. This allows us
to easily chain more commands after our own script. For example, if we want to
indent the output from our command, we can use the tidy
command and know that
everything we output is HTML:
$ ./markdown.sh foo.md | tidy -i
Script exit codes allow us to control the flow of our pipelines. For example, we
may want to publish the results of running markdown
, but only if the command
succeeded:
# Generate blog-post.html. If successful, run `./publish`, otherwise, print an error message
cat blog-post.md \
| ./markdown > blog-post.html && ./publish || echo "failed to publish blog-post.html"
A successful exit is 0
, with any other number indicating something went wrong.
Unfortunately, if we ran the above, we would always publish and never see the
error message even if curl failed. Our script doesn't call exit
explicitly, so
we always exit 0
-- even when GitHub's API isn't accessible.
Since the last thing our script run is curl
, our script should exit with
curl
's exit code. Our current call to Open3.popen3
saves three of the return
values, but the method actually returns four. We will use this fourth value to
query for curl
's exit code.
# snip
# We add a -f flag so that non-200 responses fail with an exit code of 22
curl_cmd = %Q(curl -f -s -H 'Content-Type: text/plain' https://api.github.com/markdown -d'@-')
# `stdin`, `stdout`, `stderr` are the respective file handlers for the curl command
# `thread` is a `Thread` instance representing the command we're running
stdin, stdout, stderr, thread = Open3.popen3(curl_cmd)
# snip
# thread.value is a `Process::Status`. We exit our script with whatever curl's
# exit code was.
exit thread.value.exitstatus
By following these simple conventions, our scripts work well with other existing Unix tools. This allows us to quickly prototype things that would otherwise require a lot of setup and dependencies. Since the scripts aim to handle a single task, we are more likely to reuse the script for tasks that we didn't original intend for the script to handle.
Our final source listing is:
#/usr/bin/env ruby
require "json"
require "open3"
markdown_file = ARGV.first
payload = { :text => File.read(markdown_file) }.to_json
# Curl command to post STDIN to GitHub's markdown API
#
# -f: non-200 responses fail with an exit code of 22
# -s: silent
# -H: http headers
curl_cmd = %Q(curl -f -s -H 'Content-Type: text/plain' https://api.github.com/markdown -d'@-')
# `stdin`, `stdout`, `stderr` are the respective file handlers for the curl command
# `thread` is a `Thread` instance representing the command we're running
stdin, stdout, stderr, thread = Open3.popen3(curl_cmd)
# Take the json we built and give it to curl via it's stdin. We have to
# explicitly close it's stdin before we can access it's stdout. Otherwise, curl
# will keep waiting for more input.
stdin.puts payload
stdin.close
# Print out curl's output to stdout and stderr
puts stdout.read
$stderr.puts stderr.read
# thread.value is a `Process::Status`. We exit our script with whatever curl's
# exit code was.
exit thread.value.exitstatus
I use this to generate the markup for this blog. The full source is available at jch/jsg.