Because machines still need humans
October 24, 2012Posted by on
If you’ve built an application that calls any kind of remote Web API – be it based on SOAP, REST/HTTP, Thrift, JSON-RPC or otherwise – your code must absolutely expect it to fail. Expecting your remote API calls to always work is a recipe for a late-night call to your phone during a future system disaster. As Steve Vinoski correctly states, the “illusion of RPC” runs deep in the programming world, and engineers who treat remote Web API calls the same way as local function calls do so at their peril.
Here are some of the many ways that a call to a distributed system can fail. It’s not an exhaustive list by any means, but it’s a start.
Failures of Infrastructure
These are the most basic of failures, usually experienced at the network level.
DNS Lookup Failures
The GoDaddy failure of August 2012 caused millions of DNS names to be unreachable. It’s likely that your application code expects healthy, speedy conversion of domain names to IP addresses, but if DNS doesn’t respond then it won’t get past even the first hurdle.
Lack of Network Access
Skilled mobile developers assume that their code will communicate over a cellular network that may not be always available, and handle its intermittent nature gracefully. How many developers assume the same unreliability when creating an application intended for deployment within a data center? Routers, cables, switches and power supplies fail all the time, and a failure of any of those devices could leave your system without the ability to reach the outside world right when you need it most.
Loss of Data Center
Amazon’s popular and affordable AWS services are used by a significant number of SaaS providers. Their EC2 uptime guarantee of 99.95% entices many developers to build Web APIs to be hosted within it. While that SLA sounds good, the 2011 failure of Amazon’s US-East availability zone showed the technology world what happens when during the 0.05% period. A vast majority of the systems hosted there were completely unreachable during that time, causing a major disruption for many. Importantly, those that followed Amazon’s advice to build redundancy across availability zones (such as Netflix) were able to handle the outage without interruption.
SSL Certificate Issues
SSL certificates are the backbone of secure web communications, but they can be the cause of a lot of frustration. If a server’s domain name changes and the certificate presented by it doesn’t match, the SSL handshake will fail (with good reason).
Should a certificate on either side expire, the SSL handshake will fail too. Web server administrators usually do a good job of preventing server-side certificate expiration, but client-side certificates – while required much less frequently – are sometimes deployed by users who are not very tech-savvy and who don’t keep track of when they expire.
Failures in Communication
Even if your application has basic network connectivity, a variety of errors can still be experienced as it tries to communicate with the target Web API.
A web service that is experiencing greater than peak load might take many seconds (or worse, minutes) to respond to your call. A query resulting in a huge amount of data retrieval and analysis might take just as long to process even on a lightly loaded server. Both of these scenarios could lead to your client giving up and throwing a timeout error when it fails to receive a response. Even if your client call is configured to never time out, a firewall in between your application and the server might decide to kill your connection if it looks idle for so long.
If you’re building a browser-based application and expect all your Ajax calls to reach your server, you’re in for a surprise if you ever break one of the same-origin rules as an attempt to make a call to another endpoint will be blocked by almost all browsers. Even changing something as innocuous as the TCP port number in your URL could prevent your call from working, so make sure you only make Ajax calls to your origin web server.
Products that implement older middleware technologies such as SOAP, CORBA and DCOM provide stubs that take care of encoding the entire payload for you, as well as transmitting it to the server and decoding any response. However using newer architectural approaches like REST can mean that the management of the media types and payload formats will be left to the application programmer. While offering great flexibility, that approach does place a bigger burden on the client developer to ensure that it transmits and understands data in the form the server expects.
A number of RESTful Web APIs use a redirection technique to instruct the client to resend their original request to another URI, and the HTTP standard offers response codes like 301 and 302 for that purpose. But if your client code hasn’t been built using a library that handles them automatically then you’ll have to add redirection logic for all API calls to prevent surprises when 300-range response codes are returned. And when you’re working with well-designed RESTful APIs, you should expect them to be returned.
Almost all OAuth tokens and server-side account passwords expire after a limited lifetime. It can be easy to forget this fact when you are in the heat of rolling out your application to production, so make sure that you mark your calendar ahead of their expiration dates, and that you build behaviors into your application to handle a credential failure if it occurs (perhaps by raising a system alert or emailing your IT team).
Unexpected International Content
If your application makes any assumption that the server will always send back ASCII or ISO-Latin-1 characters, you be in for a rough day when the server sends back Unicode content and your code has no idea how to decode it. Using data formats that natively support encodings like UTF-8 (for example, XML and JSON) should help somewhat but it doesn’t mean that all your work is done. You’ll still have to handle multi-byte characters and store or render them appropriately.
Proxy Servers Getting in the Way
Using a hotel or airplane’s Wi-Fi connection can cause some really interesting behavior for your application. Most pay-to-surf hotspots use an HTTP proxy server to intercept web traffic and ensure that payment is made, but sometimes those proxies do more than just ask for your money. I’ve seen some proxy servers force all HTTP 1.1 traffic down to version 1.0, causing difficulties for applications that relied on features in the 1.1 protocol. Thankfully, most invasive HTTP proxy server behavior can be bypassed by moving to HTTPS (because proxies can’t decrypt that kind of traffic between client browsers and the server as long as the server uses a properly-signed certificate).
Most web services do a good job of truncating huge query result sets before transmitting them to the caller, but some do not. If you accidentally run an enormous query on the server side, it might cause a failure for your client application. Receiving a huge amount of data requires at least as much memory to store it, and (depending on the quality of the parser) might even need to be temporarily duplicated in memory in order to parse it. Make sure that you use any pagination or truncation features available in your API to prevent your application from being slammed with a gigantic result set. If those kinds of features are not available, try to craft your queries to prevent enormous responses.
Failures in Conversation
Even if your application successfully communicates with a web service API at first, failures can still occur after that point.
This is one of the most common reasons for web service API failure. If you hit an API hard enough – even one that you’re paying for – you might discover that the vendor offering it has just cut you off. Of course, this could very well happen to your system at the worst possible moment. Some APIs have no traffic limit when they’re first released but apply them later (something Twitter API developers realized recently, much to their surprise and disappointment).
Keep in mind too that an orchestrated DDOS attack on your system (or even an innocuous load test of your own) could lead to you quickly reaching an unexpected limit with the Web APIs your application depends on.
February 29th. Daylight Saving Time. The International Date Line. The Leap Second. Time zones with partial-hour offsets. Even if you believe your application isn’t time-sensitive, any of these temporal oddities could affect the results you get from the Web APIs you are calling. Testing for them ahead of time might be difficult, but it could be time well spent.
When you sign up for an API on the Web, make sure you set your calendar to remind you ahead of the renewal date. Forgetting to renew an API account subscription will leave egg on your face – and nasty errors on the screens of your users.
The API Disappears
It’s rare, but it happens: vendors pull their API out of your market area, or they go out of business completely and shut down their services without notice. Either way the net effect is the same, and you’ll have to quickly scramble to find an alternative service.
Unexpected Payload Format
Yahoo! recently made an unannounced incompatible upgrade to their popular Placefinder API, moving it from version 1.0 to 2.0 overnight. Normally a move like that would be orchestrated in such a way to provide both old and new API versions side-by-side, but the company offered users no way to keep using the older 1.0 data format while they made the switch.
Instead, their API users woke up one day to find that Yahoo! had completely broken their applications, and the only options on the table were to either quickly move to the 2.0 format or to switch to a different service. Worse still, some users quickly made the switch to 2.0, only for Yahoo! to realize their mistake two days later and switch the API back to 1.0 – two incompatible changes in as many days.
If vendors as large as Yahoo! can make accidental incompatible changes to their API services, you have to assume that all vendors could do the same.
What You Can Do
While there are innumerable ways for Web API calls to fail, protecting your application against problems with them can be done by following a few simple guidelines:
Assume that every Web API call you make will fail. Always check that you get a response (and do so within a reasonable timeframe) and that you parse the returned payload carefully. Code very defensively when calling what are essentially unreliable functions. Build monitoring and instrumentation into your system so that your IT team gets called when remote APIs stop responding. And if you’re really confident, inject failure into your production system.
Know what your users will experience when a Web API fails. For core services that you are completely dependent on (for example, PayPal for payment processing) make sure that you fail gracefully and tell the user something useful instead of just throwing a stack trace on the screen. For secondary services (for example, Google Maps for showing store locations) consider having an alternative service available that you can fall back to, especially if you hit a traffic ceiling with your main API provider due to high traffic.
Simulate the failure of each Web API you depend upon. Testing for failure is by far the best way to defend against future surprises. If you can do it in an automatic fashion then all the better, but simply changing your hosts file to make the hostname of your remote Web APIs fail to resolve to a usable IP address will probably uncover a ton of issues that you might not have expected.
Web APIs are great, and developers love mashing them up into something exciting. But if you don’t plan for the failure of those APIs, you’ll just end up frustrating your users and driving them away from your product.
August 17, 2012Posted by on
- Know nothing about it
- Know enough to successfully pretend to know it during lunch
- Know enough to be dangerous with it
- Know enough to be employed to be dangerous with it
- Know enough to ask intelligent questions about it on stackoverflow.com
- Know enough to answer n00b homework questions about it on stackoverflow.com
- Know enough to ridicule n00b homework questions about it on stackoverflow.com
- Know enough to contribute to open-source projects written in it
- Know enough to be disgusted with the state of open-source projects written in it
- Know enough to start writing a compiler for it
- Know enough to abandon writing a compiler for it
- Know enough to invent an improved version of it to create The One True Language
- Know enough to get bored and find another language to obsess over for a decade or so
July 12, 2012Posted by on
If you spend any time writing software, you’ll eventually hit an interoperability wall: two components that should talk to each other just refuse to do so.
Maybe it’s a web API that has just been modified without proper concern for backward compatibility, breaking your existing application in the process. Or perhaps you’re trying to use two middleware products together, only to find that the communication standard they implement is horrendously complex, causing each vendor to interpret it ever-so-slightly differently and making them completely useless when brought together.
For all of the many painful incompatibilities of the past, there are plenty of wonderful success stories of software and specifications that enabled collaboration and integration in myriad ways. You just don’t hear them praised that often.
Here are some examples of technologies that do their job, and then get out of the way.
“It’s way too simple.” “It should be a binary protocol.” “It’s far too slow.”
HTTP has had all kinds of criticisms leveled at it over the years, and yet it prevails. It’s the basis for all web traffic and the vast amounts of secure e-commerce conducted over it, and a whole lot more besides.
A lot of its power comes from incredible simplicity: both computers and humans can understand HTTP commands and headers. Whether you’re a proxy server or an overworked systems admin, HTTP is easy to deal with. A proliferation of tools have been built to support it, further strengthening its foothold.
Dig deeper into its details, and you’ll find a lot of very sophisticated and smartly-designed features: encoding negotiation, authentication, encryption, caching, resumable downloads, redirection, compression, locale matching, and hooks for easily making custom extensions.
And heck, it must be good given that after 20 years it’s still only at version 1.1.
Without the adoption of TCP/IP as the network protocol of choice for the Internet, the world would be a very different place. Coming up on nearly four decades of existence, the ubiquitous packet-switching protocol that is used in almost every server, desktop, laptop and mobile device has become synonymous with the word “networking.”
When was the last time you had a network issue that could be blamed on a deficiency of the TCP/IP specification? Or the last time you had one vendor’s TCP/IP stack fail to correctly implement the protocol?
The world loves IP addresses (well, maybe it loves DNS names more), and they’re here to stay. Why? Because a group of very smart people engineered a beautifully flexible and truly usable technology. And while there are some very compelling improvements being made (or proposed) right now, this outstanding work from the mid-1970’s still benefits us all today.
I’ll admit it: I can’t read XML. My eyes simply glaze over the never-ending stream of < and > characters and the duplicated start and end tags. At its basic level, it’s a reasonable language for handling unstructured data, but the awful complexity sometimes laid upon it (XML Schema and SOAP come to mind) can make it very tiresome to deal with. For many years, it seemed like every new software standard that appeared just had to be specified in XML, regardless of its suitability for the problem at hand. It looked like we were going to be stuck with it forever. And then along came JSON.
Now you’ll find JSON as the underlying syntax in most popular web APIs. It’s even being used as the storage format for new databases. Every modern web browser has native and efficient support for it. And when was the last time you found that your browser or parsing library failed to process valid JSON? Certainly I’ve never encountered that problem. Thank its wonderfully concise grammar for making it so easy for parsers to be created in so many different languages.
There are more…and we need even more
There are plenty of success stories to be found related to well-designed interoperability standards that have spurred innovation by being interoperable. Specifications like the Java Virtual Machine and the CLI are prime examples of hugely successful standards. But why do many other standards find it hard to get an adoption foothold?
The answer is simple. Or, to put it another way, the answer is: simple.
Interoperability standards succeed when engineers are able to adopt them en masse, and that is most likely to happen if the standard is as simple as possible. Specifications that try to boil the ocean will solve little of value because their complexity will lead to less acceptance in the developer community.
Good engineers will usually take the path of least resistance in their work, so if a standard is too complex it can quickly get ignored in favor of an alternative grassroots-driven solution or even incompatible vendor-driven extensions. Conversely, if useful tools exist that lower a developer’s barrier to entry (for example IDE support, SDKs, etc), a standard can be adopted more rapidly.
The software market cares about getting stuff done easily and quickly. If an interoperability standard helps in that regard, it will likely get adopted – regardless of whether it was designed by a committee or a community.
April 30, 2012Posted by on
Bringing a programmer in for an interview and a coding test can lead to some interesting experiences, both for the interviewer and the interviewee. Most end up with the hiring manager telling them that they’ll “be in touch,” but sometimes a candidate just nails it. That’s when you consider extending a job offer before they get a chance to leave the building.
At TimeTrade we run a coding test during interviews that, for the majority of programmers, should take about 2 hours in total to complete. The whole test is comprised of a number of small problems to solve, each harder than the one before. That gives us a good initial gauge of performance based purely on completion time: if everything has been solved in under an hour, we’ll be smiling. But if two hours pass and even the first problem still hasn’t been solved, the candidate will most likely just be shown the door.
Above and beyond just solving test problems quickly, here are some signs that a programmer is truly awesome and should be handed a job offer before they leave your building:
1. They present multiple solutions
I recently interviewed a programmer who solved an entire set of tests twice: once with iterative solutions, and again recursively. I quickly made him an offer. Finding multiple solutions to a problem is a skill that engineers will need to use every day.
2. They write full documentation
Last year I interviewed someone who was so diligent, so detailed and so professional about his work that he created full Javadoc and comments for his code before he considered the solution complete. He even wrote fully automated unit tests and checked their coverage percentage. When I came back into the room at the 2-hour mark and found him typing furiously I initially thought he was having trouble with the test, but he was actually in the process of adding HTML formatting to his Javadoc. Engineers who do this intuitively are the kind you’ll want on your team.
3. They improve the test
We deliberately create tests that have some minor issues lurking within them, purely to see if the candidate (a) spots them and (b) is willing to fix them. It might be an inconsistent usage of quotation marks for strings, misleading variable names or anything along those lines. Candidates that look at all of the provided code as the test — not just the pieces we’ve asked them to write — are the ones who will do the same in our real product once they join our team.
An engineer who is willing to tell a potential employer that the supplied test contains problems shows that they consider the quality of their work to be more important than just agreeing to do what they’re told. Hire them and they’ll likely work wonders for your product, going above and beyond their assigned areas to make improvements where they are needed.
4. They refactor smartly
Most candidates like to get a solution working, then sit back and breathe a sigh of relief that they finished it successfully. That’s good, but rarely good enough to justify an on-the-spot job offer. The candidates that solve the problem but then jump right back in to refactor it are in a different category entirely. Their choice of algorithm doesn’t feel right, and they can’t ignore the feeling that it could be more efficient. Their code has some duplication in it, and that burns them up inside. These are the candidates who refactor, rewrite and improve their solution until it’s been crafted.
This can be a double-edged sword, though. If the candidate just keeps rewriting because they’re not happy until they reach a mythical point of “perfection”, there’s a chance they are one of those programmers who doesn’t know when to stop (and similarly, ship). However if they watch the clock carefully and are able to both solve the problem and refactor their solution before their time runs out, that’s a really good sign that you should consider an offer.
5. All other signs point to “hire”
Sometimes there are plenty of non-technical signs that you’ve found the right candidate. Your other team members take you aside and tell you, “We have to hire this lady.” Their personality feels like a great fit for the team. They have relevant and recent experience in what they’ll need to do. You know some people who have worked with them before and they tell you they are wonderful to have on a team (and that they’d hire them again in a second). The candidate is excited about the company and the opportunity and is hungry to start contributing.
If the candidate passes technical muster and all other signs point to “hire,” why wait? If you do, you may lose the candidate to another employer who knows how to read the same signs faster than you can. Instead, be decisive and make the offer fast, thereby telling the candidate how much the company wants them on board. It will help start the whole relationship off on the right foot, for both parties.
So the next time you’ve got a wonderful candidate in your building, don’t assume someone even better will arrive the next day. Make them an offer and get yourself – and the candidate – back to work.
March 2, 2012Posted by on
We’re currently hiring web engineers to help build the next-generation of TimeTrade’s online appointment scheduling system. Lots of resumes come my way, but 99% of them look exactly the same, following this format:
“I’m a web engineer looking for web engineering work”.
[No link to an online portfolio. No effort to craft the objective to match the position for which they're applying. Typically describes only what the engineer wants to get out of a new position, rather than what she or he will bring to the company that hires them.]
HTML, DHTML, XML, CSS, JSON, REST, SOAP, AJAX, PHP, CGI, VI, EMACS, …
[A boat-load of technologies, old and new, sprayed onto the resume as one enormous list of acronyms. No effort made to describe which technologies they are expert in versus what they've spent 5 minutes playing with on a boring Saturday afternoon. Alphabet soup.]
[A lengthy dissertation about every place the candidate has ever worked. Yawn-inducing descriptions of how they worked there. No URLs for me to see the web applications they built.]
…and that’s usually all I get.
Is there a factory somewhere that churns these out on a conveyor belt? Should I blame Microsoft Word’s built-in resume templates? Or perhaps it’s the fault of tech recruiters who encourage this kind of lazy resume format in the name of “consistency”?
There are plenty of great engineers who could use their experience creating awesome web applications to build incredible resumes for themselves but ironically, never do. These programmers work in a world of aesthetics, creativity and technical artistry and yet advertise themselves with the passion of a 40-year accountancy veteran who enjoys working in the windowless basement of a bank and whose favorite color is gray.
So let’s fix that. Here are some tips that will help you rise far above the crowd.
Completely rethink your resume format
Why submit a typical resume at all? Check out these really creative online resumes that were found on Pinterest. This kind of out-of-the-box thinking might not get you anywhere with old-fashioned employers, but I’ll be blunt: a submission along those formats will get you noticed here, and will very likely put you far ahead of the pack with many other employers.
Build an online portfolio
One of the fastest ways for an employer to figure out if they want to interview you is to show them what you’ve already built. Web engineers have a massive advantage over server-side engineers because their work is visible by its very nature, and very often publicly accessible online. If you’re writing a old-fashioned resume, at least list the URLs for your proudest work at the top.
If your work isn’t public (because it’s only available on pay-per-use sites or hidden behind corporate firewalls) then see if you can get screenshots of your web applications in action and submit them along with your job application.
If possible, build a personal website to host your work samples and advertise yourself using the technology you work in every day. I’d be more than happy to receive a set of URLs to personal sites on a daily basis rather than a bunch of 7-page resumes.
Focus more on the “what,” not the “how”
Technology skills are important, but they’re really a means to an end. Employers want to get things done, and the technologies used to build new features and applications simply aren’t as important as the effort itself. So tell us what you’ve built in the past, the impact it had on your customers and business, and why it should matter to us. Then – and only then – tell us what whizz-bang technologies you used to do it.
Prove that you’re a human
I’m happy to review any wonderful out-of-the-box resume sent my way and give constructive feedback. Those who follow the suggestions above are more likely to hear the words, “You need to come and work here!”.