a t e v a n s . c o m

(╯°□°)╯︵ <ǝlqɐʇ/>

Curses is a C library for terminal-based apps. If you are writing a screen-based app that runs in the terminal, curses (or the "newer" version, ncurses ) can be a huge help. There used to be an adapter for Ruby in the standard library, but since 2.1.0 it's been moved into its own gem.

I took a crack at writing a small app with curses, and found the documentation and tutorials somewhat lacking. But after a bit of learning, and combining with the Verse and TTY gems, I think it came out kinda nice.

Here's a screenshot of the app, which basically stays open and monitors a logfile:

logwatch demo gif

There are three sections - the left side is a messages pane, where the app will post "traffic alert" and "alert cleared" messages. The user can scroll that pane up and down with the arrow keys (or h/j if they've a vim addict). On the right are two tables - the top one shows which sections of a web site are being hit most frequently. The bottom shows overall stats from the logs.

Here's the code for it, and I'll step through below and explain what does what:

require "curses"
require "tty-table"
require "logger"

module Logwatch
  class Window

    attr_reader :main, :messages, :top_sections, :stats

    def initialize
      Curses.curs_set 0 # invisible cursor
      Curses.noecho # don't echo keys entered

      @lines = []
      @pos = 0

      half_height = Curses.lines / 2 - 2
      half_width = Curses.cols / 2 - 3

      @messages = Curses::Window.new(Curses.lines, half_width, 0, 0)
      @messages.keypad true # translate function keys to Curses::Key constants
      @messages.nodelay = true # don't block waiting for keyboard input with getch

      @top_sections = Curses::Window.new(half_height, half_width, 0, half_width)

      @stats = Curses::Window.new(half_height, half_width, half_height, half_width)
      @stats << "Stats:"

    def handle_keyboard_input
      case @messages.getch
      when Curses::Key::UP, 'k'
        @pos -= 1 unless @pos <= 0
      when Curses::Key::DOWN, 'j'
        @pos += 1 unless @pos >= @lines.count - 1
      when 'q'

    def print_msg(msg)
      @lines += Verse::Wrapping.new(msg).wrap(@messages.maxx - 10).split("\n")

    def paint_messages!
      @pos ||= 0
      @messages.setpos(0, 0)
      @lines.slice(@pos, Curses.lines - 1).each { |line| @messages << "#{line}\n" }

    def update_top_sections(sections)
      table = TTY::Table.new header: ['Top Section', 'Hits'], rows: sections.to_a
      @top_sections.setpos(0, 0)
      @top_sections.addstr(table.render(:ascii, width: @top_sections.maxx - 2, resize: true))
      @top_sections.addstr("\nLast refresh: #{Time.now.strftime('%b %d %H:%M:%S')}")

    def update_stats(stats)
      table = TTY::Table.new header: ['Stats', ''], rows: stats.to_a
      @stats.setpos(0, 0)
      @stats.addstr(table.render(:ascii, width: @stats.maxx - 2, resize: true))
      @stats.addstr("\nLast refresh: #{Time.now.strftime('%b %d %H:%M:%S')}")

    def teardown



On initialize, we do some basic initialization of the curses gem - this will set up curses to handle all rendering to the terminal window.

Curses sets up a default Curses::Window object to handle rendering and listening for keyboard input, accessible from the stdscr method. This is where Curses.lines and Curses.cols come from, and represent the whole terminal.

I initially tried using the default window's subwin method to set up the panes used by the app, but that proved to add a whole bunch of complication for no actual benefit. Long ago it may have provided a performance boost, but we're well past that, I think.

Also tried using the Curses::Pad class so I wouldn't have to handle scrolling myself, but that also had tons of wonky behavior. Rendering yourself isn't that hard; save the trouble.

To handle keyboard input, we set keypad(true) on the messages window. We also set nodelay = true (yes, one is a method call, the other is assignment, no idea why) so we can call .getch but still update the screen while waiting for input.

The two stats windows, we initialize mostly empty. Then call refresh on all three to get them set up on the active terminal.

Main Render Loop

The class that loops and takes actions is not the window manager; but the interface is pretty simple. There's a loop that checks for updates from the log file, updates the stats data store, then calls the two render methods for the stat windows. It also tells the window manager to handle any keyboard input, and will call print_msg() if it needs to add an alert or anything to the main panel.

The main way to get text onto the screen is to call addstr() or << on a Curses::Window , then call refresh() to paint the buffer to the screen.

The Window has a cursor, and it will add each character from the string and advance that, just like in a text editor. It tries to do a lot of other stuff; if you add characters beyond what the screen can show, it will scroll right and hide the first n columns. If you draw too many lines it will scroll down and provide no way to scroll back up. I tried dealing with scrl() and scroll() methods and such, but could never get the behavior working well. In the end, I did it manually.

I used the verse gem to wrap lines of text so that we never wrote past the window boundaries. The window manager keeps an array of all lines that have been printed during the program, and a position variable representing how far we've scrolled down in the buffer. On each update it:

  1. clears the Curses::Window buffer
  2. moves the cursor back to (0,0)
  3. prints the lines within range to the Curses::Window
  4. calls refresh() to paint the Curses::Window buffer to the screen

The stats windows are basically the same. I used the TTY::Table gem from the tty-gems collection to handle rendering the calculated stats into pretty ASCII tables.


The teardown method clears the screen, which resets the terminal to non-visual mode. The handle_keyboard_input method calls exit(0) when a user wants to quit, but the larger program handles the interrupt signal and ensure 's the teardown method gets called.


Hope that's helpful! I had the wrong model of how all this stuff worked in my head for most of the development of this simple app. Maybe having what I came to laid out here will be useful.

I wanted to make a simple key-value server in Elixir - json in, json out, GET, POST, with an in-memory map. The point is to reinvent the wheel, and learn me some Elixir. My questions were: a) how do I build this without Phoenix and b) how do I persist state between requests in a functional language?

Learning new stuff is always painful, so this was frustrating at points and harder than I expected. But I want to emphasize that I did get it working, and do understand a lot more about how Elixir does things - the community posts and extensive documentation were great, and I didn't have to bug anyone on StackOverflow or IRC or anything to figure all this out.

Here's what the learning & development process sounded like from inside my head.

  1. First, how do I make a simple JSON API without Phoenix? I tried several tutorials using Plug alone. Several of them were out of date / didn't work. Finally found this one, which was up-to-date and got me going. Ferret was born!

  2. How do I reload code when I change things without manually restarting? Poked around and found the remix app.

  3. Now I can take JSON in, but how do I persist across requests? I think we need a subprocess or something? That's what CodeShip says, anyhow.

  4. Okay, I've got an Agent. So, where do I keep the agent PID so it's reusable across requests?

  5. Well, where the heck does Plug keep session data? [3] That should be in-memory by default, right? Quickly, to the source code!

  6. Hrm, well, that doesn't tell me a lot. Guess it's abstracted out, and in a language I'm still learning.

  7. Maybe I'll make a separate plug to initialize the agent, then dump it into the request bag-of-data?

    Pretty sure plug MyPlug, agent_thing: MyAgent.start_link will work. Can store that in my Plug's options, then add it to Conn so it's accessible inside requests

  8. Does a plug's init/1 get called on every request, or just once? What about my Router's init/1 ? Are things there memoized?

  9. Guess I'll assume the results are stored and passed in as the 2nd arg to call/2 in my plug.

  10. Wait, what does start_link return?

    14:15:15.422 [error] Ranch listener Ferret.Router.HTTP had connection process started with :cowboy_protocol:start_link/4 at #PID<0.335.0> exit with reason: \{\{\%MatchError{term: [iix: {:ok, #PID<0.328.0>}]}

    ** (MatchError) no match of right hand side value: {:ok, #PID<0.521.0>}
    (ferret) lib/plug/load_index.ex:10: Ferret.Plug.LoadIndex.init/1
  12. figures out how to assign arguments

    turns out [:ok, pid] and {:ok, pid} and %{"ok" => pid} are different things

  13. futzes about trying various things to make that work

  14. How do I log stuff, anyway? Time to learn Logger.


    14:29:45.127 [info]  GET /put
    14:29:45.129 [error] Ranch listener Ferret.Router.HTTP had connection process started with :cowboy_protocol:start_link/4 at #PID<0.716.0> exit with reason: \{\{\%FunctionClauseError{arity: 4,
  16. half an hour later - Oh, I'm doing a GET request when I routed it as POST. I'm good at programmering! I swear! I'm smrt!

  17. Turns out Conn.assign/3 and conn.assigns are how you put things in a request - not Conn.put_private/3 like plug/session uses.

  18. Okay, I've got my module in the request, and the pid going into my KV calls

  19. WTF does this mean?!?!

    Ranch listener Ferret.Router.HTTP had connection process started with :cowboy_protocol:start_link/4 at #PID<0.298.0> exit with reason: {\{:noproc, {GenServer, :call, [#PID<0.292.0>,
  20. bloody hours pass

  21. The pid is right bloody there! Logger.debug shows it's passing in the same pid for every request!

  22. Maybe it's keeping the pid around, but the process is actually dead? How do I figure that out? tries various things

  23. Know what'd be cool? Agent.is_alive? . Things that definitely don't work:

    1. Process.get(pid_of_dead_process)
    2. Process.get(__MODULE__)
    3. Process.alive?(__MODULE__)

    Which is weird, since an Agent is a GenServer is a Process (as far as I can tell). This article on "Process ping-pong" was helpful.

  24. Finally figured out to use GenServer.whereis/1 , passing in __MODULE__ , and that will return nil if the proc is dead, and info if it's alive.

  25. Turns out I don't need my own plug at all: just init the Agent with the __MODULE__ name, and I can reference it by that, just like a GenServer.

  26. IT'S STILL SAYING :noproc ! JEEBUS!

  27. Okay, I guess remix doesn't re-run Ferret.Router.init/1 when it reloads the code for the server. So when my Agent dies due to an ArgumentError or whatever, it never restarts and I get this :noproc crap.

  28. I'll just manually restart the server - I don't want to figure out supervisors right now.

  29. This seems like it should work, why doesn't it work? Agent.get_and_update __MODULE__, &Map.merge(&1, dict)

  30. Is it doing a javascript async thing? Do I need to tell Plug to wait on the response to get_and_update ?

  31. Would using Agent.update and then Agent.get work? Frick, I dunno, how async are we getting here? All. the. examples. use a pid instead of a module name to reference the agent.

  32. How would I even tell plug to wait on an async call?

  33. Oh, frickin'! get_and_update/3 has to return a tuple , and there's no function that does single-return-value-equals-new-state.

    I need a function that takes the new map, merges it with the existing state, then duplicates the new map to return, but get_and_update/3 's function argument only receives the current state and doesn't get the arguments.

    get_and_update/4 supposedly passes args, but you have to pass a Module & atom instead of a function. I couldn't make that work, either.

  34. Does Elixir have closures? I mean, that wouldn't make a lot of sense from a "pure functions only" perspective, but in Ruby it'd be like

    new_params = conn.body_params
    Agent.get_and_update do |state|
      new_state = Map.merge(state, new_params)
      [new_state, new_state]

    ...errr, whelp, no, that doesn't work.

  35. The Elixir crash-course guide doesn't mention closures, and I'm not getting how to do this from the examples.

  36. hours of fiddling

  37. uuuuugggghhhhhhhh functional currying monad closure pipe recursions are breaking my effing brain. You have to make your own curry, or use a library. This seems unnecessary for such a simple dang thing.

  38. Is there a difference between Tuple.duplicate(Map.merge(&1, dict), 2) and Map.merge(&1, dict) |> Tuple.duplicate(2) ? I dunno, neither one of those are working.

  39. What's the difference between?????

    1. def myfunc do ... end ; &myfunc
    2. f = fn args -> stuff end ; &f
    3. &(do_stuff)
  40. Okay, this is what I want: ​ &(Map.merge(&1, dict) |> Tuple.duplicate 2)

    Why is dict available inside this captured function definition? I dunno.

  41. BOOM OMG IT'S WORKING! Programming is so cool and I'm awesome at it and this is the best!

  42. Let's git commit!

  43. Jeebus, I better write this crap down so I don't forget it. Maybe someone else will find it useful. Wish I coulda Google'd this while I was futzing around.

  44. I'm gonna go murder lots of monsters with my necromancer while my brain cools off. Then hopefully come back and figure out:

    1. functions and captures
    2. pipe operator's inner workings
    3. closures???
    4. supervisors

Links I used:

Elixir Getting-Started Guide

Maps: elixir-lang

Logging with Logger

Processes & State

Statefulness in a Stateful Language (CodeShip)

Processes to Hold State

When to use Processes in Elixir

Elixir Process Ping-Pong

Using Agents in Elixir

Agent - elixir-lang

Concurrency Abstractions in Elixir (CodeShip)

GenServer name registration (hexdocs)

GenServer.whereis - for named processes

Agent.get_and_update (hexdocs) - hope you are good with currying: no way to pass args into the update function unless you can pass a module & atom (and that didn't work for me)


How to build a lightweight webhook endpoint with Elixir

Plug (Elixir School) - intro / overview

Plug body_params - StackOverflow

plug/session.ex - how do they get / store session state?

Plug.Conn.assign/3 (hexdocs)

Plug repo on Github

Function Composition

Currying and Partial Application in Elixir

Composing Elixir Functions

Breaking Up is Hard To Do

Function Currying in Elixir

Elixir Crash Course - partial function applications

Partial Function Application in Elixir

Elixir vs Ruby vs JS: closures

Was looking at our AWS configuration audit in Threat Stack today. One issue it highlighted was that some of our security groups had too many ports open. My first guess was that there were vestigial "default" groups created from hackier days of adding things from the console.

But before I could go deleting them all, I wanted to see if any were in use. I'm a lazy, lazy man, so I'm not going to click around and read stuff to figure it out. Scripting to the rescue!

#!/usr/bin/env ruby

require 'bundler/setup'
require 'aws-sdk'
require 'json'

client = Aws::EC2::Client.new

groups = client.describe_security_groups

SecGroup = Struct.new(:open_count, :group_name, :group_id) do
  def to_json(*a)

open_counts = groups.security_groups.map do |group|
  counts = group.ip_permissions.map {|ip| ip.to_port.to_i - ip.from_port.to_i + 1 }
  SecGroup.new counts.inject(&:+), group.group_name, group.group_id

wide_opens = open_counts.select {|oc| oc.open_count > 1000 }

if wide_opens.empty?
  puts "No wide-open security groups! Yay!"

puts "Found some wide open security groups:"
puts JSON.pretty_generate(wide_opens)

Boxen = Struct.new(:instance_id, :group, :tags) do
  def to_json(*a)

instances_coll = wide_opens.map do |group|

  resp = client.describe_instances(
    dry_run: false,
    filters: [
        name: "instance.group-id",
        values: [group.group_id],

  resp.reservations.map do |r|
    r.instances.map do |i|
      Boxen.new(i.instance_id, group, i.tags)

instances = instances_coll.flatten

puts "Being used by the following instances:"
puts JSON.pretty_generate(instances)

Something to throw in the 'ole snippets folder. Maybe it'll help you, too!

Every now and then I have to explain why I like working remote, why I don't like offices, and why I hate open offices in particular.

I'm an introvert at heart. I can be social, I do like hanging out with people, and I get restless and depressed if I'm alone at home for a week or two. But I have to manage my extroversion -- make sure I allocate sufficient time to quiet, introverted activities. Reading books, single-player games, hacking on side projects, etc.

To do great work, I need laser-like focus. I need multi-hour uninterrupted blocks of time. Many engineers feel the same way - see Paul Graham's oft-cited "Maker's Schedule" essay.

Open offices are the worst possible fucking environment for me.

Loud noises at random intervals make me jump out of my skin - and I don't even have PTSD or anything. I need loud music delivered via headphones to get around the noise inherent to open offices.

Constant movement in my peripheral vision is a major distraction. I often have to double-check to see if someone is trying to talk to me because of the aforementioned headphones. I message people on Slack to see if they have time to chat, but plenty of people think random shoulder-taps are great.

Privacy is important to me. People looking over my shoulder at what I'm doing makes me itch. I feel like people are judging me in real-time based on my ugly, unfinished work. Even if they're talking to someone else, I get paranoid and want to know if they're looking.

If you follow Reddit, Hacker News, or any tech or programming related forums, you'll see hate-ons for open offices pop up every month or two. Here's a summary.

Link Roundup:

PeopleWare: Productive Projects and Teams (3rd Edition) (originally published: 1987)

Washington Post: Google got it wrong. The open-office trend is destroying the workplace.

Fast Company: Offices For All! Why Open-Office Layouts Are Bad For Employees, Bosses, And Productivity

BBC: Why open offices are bad for us | [Hacker News thread]

CNBC: 58% of high-performance employees say they need more quiet work spaces

Mental Floss: Working Remotely Makes You Happier and More Productive

Nathan Marz: The inexplicable rise of open floor plans in tech companies (creator of Apache Storm) [Hacker News thread]

Various. Reddit. Threads. Complaining.

Slashdot. Hates. Them. Too.

The hacking group that leaked NSA secrets claims it has data on foreign nuclear programs - The Washington Post - We are officially in the cyberpunk era of information warfare

URL Validation - A guide and example regex for one of those surprisingly difficult problems

Cookies are Not Accepted - New York Times - “Protecting Your Digital Life in 8 Easy Steps”, none of which is “keep your software updated” ~ @pinboard

adriancooney/console.image: The one thing Chrome Dev Tools didn't need. - console.image("http://i.imgur.com/hv6pwkb.png"); (yes, I added a stupid easter egg)

PG MatViews for better performance in Rails 🚀 - Postgresql Materialized Views for fast analytics queries

An Abridged Cartoon Introduction To WebAssembly – Smashing Magazine

Fixing Unicode for Ruby Developers – DaftCode Blog - Another surprisingly difficult problem: storing, reading, and interpreting text in files

Strength.js - Password strength indicator w/ jQuery

Obnoxious.css - Animations for the strong of heart, and weak of mind

That's a wrap! I miss it already. RailsConf is a wonderful conference, and I'd encourage any Ruby and/or Rails engineer to go.

I don't think I went to any conferences before joining Hired. We've been a sponsor for all three RailsConf's since I joined in 2014, and I've gone every year. The company values career development, and going to conferences is part of that. We are a Rails shop, our founders are hardcore Ruby junkies, and we believe in giving back to the community for all the things it's done for us. Of course, as a tech hiring marketplace, it makes business sense as well.

I gave Hired's sponsor talk this year - my first time speaking at a conference, and a big 'ole check off the bucket list. I'd love to do it again. I'd love to give the same talk again for meetup groups or something - I learned a lot from having a "real" audience who are neither coworkers nor the mirror at home. It'd probably be a much better talk with a few iterations.

I went through two of our open source libraries from a teamwork and technical perspective. This post will get a link to it once ConFreaks puts it up.

Developer Anxiety

This seemed to be a major theme of the conf overall. DHH's keynote talked about the FUD and the shiny-new-thing treadmill that prevents us from putting roots down in the community of a language & ecosystem. Searls' keynote talked about how many of his coding decisions are driven by fear of familiar problems. There was a panel on spoon theory - which applies more generally than Christine Miserandino's personal example.

Studies of anxiety and stress in development seem to indicate that anxiety is bad for productivity. Anxiety and stress impair focus, shrink working memory, and hurt creativity - which are all necessary for doing good work. These studies are marred by small sample sizes, poor methodology, and the fact that we generally don't know what the hell "productivity" even means for developers. But the outcomes seem obvious intuitively.

It would behoove us to figure out how to reduce the overall anxiety in our industry. May is Mental Health Awareness Month in 2017. I've seen a lot of folks talking about Open Source Mental Illness, which seems like a great organization. There's not going to be a silver bullet, it'll take a lot of effort to educate, de-stigmatize, and work toward solutions. At least talking about it is a good start.

Working Together

Lots of talks dealt with empathy, teamwork, witholding judgement, and team dynamics. Searls had a quotable line - I'll paraphrase it as: "When I see people code in what I think is a bad way, I try to have empathy - I would hate for someone else to tell me I couldn't code my favorite way, so I can put myself in their shoes."

The Beyond SOLID talk discussing the continued divide between "business" and development. Haseeb Qureshi countered DHH's pro-tribalism, saying it's a barrier to communication that prevents developers from converging on "optimal" development. Joe Mastey's talk on legacy code discussed ways to build team momentum and reduce fear of codebase dragons. Several talks covered diversity, where implicit bias can shut down communication and empathy right quick.

Working together to build things is a huge and complex process, and there's no overall takeaway to be had here. Training ourselves in empathy and improving our communications are key developments that seemed to be a common thread.

I didn't see a lot of talk about the organizations or structures affecting how we work together. Definitely something I'd like to hear more about - particularly with examples of structural change in organizations, what worked, and what didn't. How do you balance PMs vs EMs? Are those even the right roles? How does it affect an org to have a C-level exec who can code?

Some Technical Stuff

There were way fewer "do X in Y minutes" talk this year, for which I am greatful. That sort of thing can be summed up better in a blog post, and frankly hypes up new tech without actually teaching much. There were more "deep dive" talks, a few "tips for beginners" talks, and some valuable-looking workshops. I didn't go to many of these, but it seemed like a good mix.


It was a great conference, and I'd love to go back next year. I'd like to qualify for a non-sponsor talk some time, but I should probably act more locally first - perhaps having a few live iterations beforehand would improve the big-audience presentation.

If you're a Rails developer, or a Ruby-ist of any sort, I'd say it's worth the trip. There may be scholarships available if you can't go on a company's dime - worth a shot.

I was listening to the Liftoff podcast episode about the Voyager missions. They pointed out that it launched in 1977 - well, NASA put it best:

This layout of Jupiter, Saturn, Uranus and Neptune, which occurs about every 175 years, allows a spacecraft on a particular flight path to swing from one planet to the next without the need for large onboard propulsion systems.

That's a hard deadline. "If your code isn't bug-free by August 20, we'll have to wait 175 years for the next chance." No pressure.

In startups, I often hear of "hard" deadlines like "we have to ship by Friday so we can keep on schedule." Or worse, "we promised the customer we'd have this by next Tuesday." To meet these deadlines, managers and teams will push hard. Engineers will work longer hours in crunch mode. Code reviews will be lenient. Testing may suffer. New code might be crammed into existing architecture because it's faster that way, in the short term. Coders will burn out.

These are not deadlines, they're bullshit. Companies are generally not staring down a literal once-a-century event which, if missed, will wind down the entire venture.

If you're consistently rushed and not writing your best code, you're not learning and improving. The company is getting a crappier codebase that will slow down and demoralize the team, and engineers are stagnating. Demand a reason for rush deadlines, and don't accept "...well, because the sprint's almost over..."

Stumbled on something interesting - checking a class to see if it "is a" or is "kind of" a parent class didn't work. Checking using the inheritence operator did work, as well as looking at the list of ancestors.

irb(main)> class MyMailer < ActionMailer::Base ; end
=> nil
irb(main)> MyMailer.is_a? Class
=> true
irb(main)> MyMailer.kind_of? Class
=> true
irb(main)> MyMailer.is_a? ActionMailer::Base
=> false
irb(main)> MyMailer.kind_of? ActionMailer::Base
=> false

irb(main)> a = MyMailer.new
=> #<MyMailer:0x007fa6d4ce9938>

irb(main)> a.is_a? ActionMailer::Base
=> true
irb(main)> a.kind_of? ActionMailer::Base
=> true

irb(main)> !!(MyMailer < ActionMailer::Base)
=> true
irb(main)> !!(MyMailer < ActiveRecord::Base)
=> false
irb(main)> MyMailer.ancestors.include? ActionMailer::Base
=> true

I suppose .is_a? and .kind_of? are designed as instance-level methods on Object. Classes inherit from Module, which inherits from Object, so a class is technically an instance. These methods will look at what the class is an instance of - pretty much always Class - and then check the ancestors of that.

tl;dr when trying to find out what kind of thing a class is, use the inheritence operator or look at the array of ancestors. Don't use the methods designed for checking the type of an instance of something.

I love Mike Gunderloy's "Double Shot" posts on his blog A Fresh Cup. Inspired by that, here's a link roundup of some stuff I've read lately, mainly from my Pinboard.

How to monitor Redis performance metrics - Guess what I've been up to for the last week or two?

redis-rdb-tools - Python tool to parse Redis dump.rdb files, analyze memory, and export data to JSON. For some advanced analysis, load the exported JSON into Postgres and run some queries.

redis-memory-analyzer - Redis memory profiler to find the RAM bottlenecks through scaning key space in real time and aggregate RAM usage statistic by patterns.

LastPass password manager suffers ‘major’ security problem - They've had quite a few of these, recently.

0.30000000000000004.com - Cheat sheet for floating point stuff. Also, a hilarious domain name.

Subgraph OS - An entire OS built on TOR, advanced firewalls, and containerizing everything. Meant to be secure.

When I get paged, my first step is to calmly(-ish) asses the situation. What is the problem? Our app metrics have, in many cases, disappeared. Identify and confirm it: yep, bunch of dashboards are gone.

Usually I start debugging at this point. What are the possible reasons for that? Did someone deploy a change? Maybe an update to the metrics libraries? Nope, too early: today's deploy logs are empty. Did app servers get scaled up, which might cause rate-limiting? Nah, all looks normal. Did our credentials get changed? Doesn't look like it, none of our tokens have been revoked.

All of that would have been a waste of time. Our stats aggregation & dashboard service, Librato, was affected by a wide-scale DNS outage. Somebody DDoS'd Dyn, one of the largest DNS providers in the US. Librato had all kinds of problems, because their DNS servers were unavailable.

We figured that out almost immediately, without having to look for any potential problems with our system. It's easy for me to forget to check status pages before diving into an incident, but I've found a way to make it easier. I made a channel in our Slack called #statuspages . Slack has a nifty slash command for subscribing to RSS feeds within a channel. Just type

/feed subscribe http://status.whatever.com/feed-url.rss

and boom! Any incident updates will appear as public posts in the channel.

Lots of services we rely on use StatusPage.io, and they provide RSS and Atom feeds for incidents and updates. The status pages for Heroku and AWS also offer RSS feeds - one for each service and region in AWS' case. I subscribed to everything that might affect site and app functionality, as well as development & business operations - Github, npm, rubygems, Atlassian (Jira / Confluence / etc), Customer.io etc.

Every time one of these services reports an issue, it appears almost immediately in the channel. When something's up with our app, a quick check in #statuspages can abort the whole debugging process. It can also be an early warning system: when a hosted service says they're experiencing "delayed connections" or "intermittent issues," you can be on guard in case that service goes down entirely.

Unfortunately not all status pages have an RSS feed. Salesforce doesn't provide one. Any status page powered by Pingdom doesn't either: it's not a feature they provide. I can't add Optimize.ly because they use Pingdom. C'mon y'all - get on it!

I've "pinned" links to these dashboards in #statuspages so they're at least easy to find. Theoretically, I could use a service like IFTTT to get notified whenever the page changes - I haven't tried, but I'm betting that would be too noisy to be worth it. Some quick glue code in our chat bot to scrape the page would work, but then the code has to be maintained, and who has time?

We currently have 45 feeds in #statuspages . It's kind of a disaster today with all the DNS issues, but it certainly keeps us up-to-date. Thankfully Slack isn't down for us - that's a whole different dumpster fire. But I could certainly use an RSS service as an alternative, such as my favorite Feedbin. That's the great things about RSS: the old-school style of blogging really represented the open, decentralized web.

I'm not the first person to think of this, I'm sure, but hopefully it will help & inspire some of you fine folks out there.