Improving Your Web Applications with Selenium WebDriver

Leverage Selenium WebDriver to automate part of your web application testing approach.

Testing Login or Search seems straightforward. Click, type, click, type, wait, look. Something happens, though, between that initial test of your first functionality and the final polished application, with many features, across many browsers and devices. Invariably things get really complicated. Luckily, Selenium WebDriver can handle a lot of that complexity for you, if you leverage it correctly.

Historically, multi-platform deployment and robust testing have existed as natural enemies. Throughout the mid-2000s, software providers simply insisted on support one single most-popular browser.  Internal web applications were guaranteed only to work with Internet Explorer 6 on Windows XP. If you wanted to use Internet Explorer 7 when it came out, they made no guarantees and helpdesk would not support the call. As for something exotic like upstart Firefox, forget about it.

Organizations didn't do this to annoy their users; they were simply trying to minimize the cost of testing and support. That speaks to just how difficult and time-consuming testing web applications become.

Eventually, the world adapted Internet Explorer's dominance receded. Users demanded functionality in their browser of choice and refused to accept seriously broken software. At the same time, the bought phones and tablets and demanded the privilege of working (or shopping) while waiting in the Doctors office. The sponsoring executives, eager to get more revenue and work hours, insisted the software meet these demands.

The History of Selenium WebDriver

Forward-thinking developers did not just sit idle during this period of Internet Explorer dominance. Even as the web application testing conundrum slowed the pace of delivery, clever people sought ways to address it.

Developers at Thoughtworks, specifically, had some ideas. While working on a web version of a time and expenses application, Thoughtworker (and later, Sauce Labs co-founder) Jason Huggins built a tool that could obey encoded scripts. Selenium was born. Another Thoughtworker, Paul Hammant, built on this automation idea. He introduced a second mode of operation for Selenium, that allowed remote "steering" of the functionality over TCP/IP. This meant that users could drive the functionality using the programming language of their choice. Selenium thus had two operating modes: core and remote-controlled (RC). This had powerful implications for testing. Selenium users could now script interactions with browsers.

Selenium continued to evolve By 2007, a competing design emerged that would become known as WebDriver. Whereas Selenium 1.0's RC mode worked via Javascript that ran in all browsers, the new design operated via plugins that were “close to the metal” for each individual browser. The project eventually merged Selenium with WebDriver and released the merger as Selenium 2.0. WebDriver had replaced RC mode.

(You can read a more detailed history of Selenium WebDriver here).

Selenium WebDriver as a De Facto Standard

As all of this development took place, Selenium WebDriver made its way beyond the purview of Thoughtworks. Some of the original developers had moved on and Thoughtworks open sourced the technology, so the world owned it by way of a public committee

As websites moved from brochures to full applications with social elements, the world continued to struggle with the problem of testing them. This problem only grew worse with the rise of competitive browsers, mobile technologies, and competing operating systems. Web application authors saw their conceptual binders of test cases grow exponentially.

Against this backdrop, Selenium WebDriver emerged as a de facto standard. As browsers multiplied, contributors could write plugins to support them. This allowed test automation to expand to new browsers with a fraction of the effort. And, on top of that, Selenium users could automate with their language of choice. This lowered barriers to entry even further.

As many-browser support became the new reality, serious browser automation meant Selenium WebDriver.

Selenium WebDriver Architecture

Let's look in a bit more detail at just how this works such a broad user base. Selenium WebDriver boasts an architecture that makes it incredibly extensible and flexible.

If you've done any work in an object-oriented language, you've probably heard of the so-called "gang of four" design patterns. One of these is called the bridge pattern. Its description often sounds intimidating: decouple an abstraction from its implementation. But this is actually a fairly simple, if powerful, concept. Think of light switches in your house. The switch is your abstraction (or interface) and the light turning on or off is the implementation.

You want to be able to change light bulbs without caring whether you have a push-button or rocker switch. And you also want to be able to change from a push button switch to a rocker without caring whether you have a yellow or white light bulb. You want to do this to turn the light bulb/light switch combination problem into an additive one instead of a multiplicative one. And Selenium WebDriver's architecture works exactly the same way.

First, you have a collection of bindings representing your language choices for scripting (e.g. C#, Java, etc). Then you have a collection of drivers representing the different browsers. In the middle sits the Web Driver API. You can now add bindings and browser drivers independently, making for incredible depth of support. If some new browser comes out and someone writes a driver for it, then people using any binding can make use of it. Likewise, if someone adds a binding for a new programming language, they'll immediately have use of all available drivers.

Let's look in a bit more detail at just how this works such a broad user base. Selenium WebDriver boasts an architecture that makes it incredibly extensible and flexible.

If you've done any work in an object-oriented language, you've probably heard of the so-called "gang of four" design patterns. One of these is called the bridge pattern. Its description often sounds intimidating: decouple an abstraction from its implementation. But this is actually a fairly simple, if powerful, concept. Think of light switches in your house. The switch is your abstraction and the light turning on or off is the implementation.

You want to be able to change light bulbs without caring whether you have a push-button or rocker switch. And you also want to be able to change from a push button switch to a rocker without caring whether you have a yellow or white light bulb. You want to do this to turn the light bulb/light switch combination problem into an additive one instead of a multiplicative one. And Selenium WebDriver's architecture works exactly the same way.

First, you have a collection of bindings representing your language choices for scripting (e.g. C#, Java, etc). Then you have a collection of drivers representing the different browsers. In the middle sits the Web Driver API. You can now add bindings and browser drivers independently, making for incredible depth of support. If some new browser comes out and someone writes a driver for it, then people using any binding can make use of it. Likewise, if someone adds a binding for a new programming language, they'll immediately have use of all available drivers.

Consider an Example

Let's see what this actually looks like using a bit of code. For example purposes, I'll use Java, but you can easily extrapolate the idea here to your preferred language. The idea is just to showcase how one particular binding makes use of the API to allow easy use across different drivers.

 public void SeeWhatHappens() {
     GoogleYourself("Joe Smith");
 }
 
 public void GoogleYourselfAcrossBrowsers(String name) {
     GoogleYourself(name, new FirefoxDriver());
     GoogleYourself(name, new InternetExplorerDriver());
     GoogleYourself(name, new ChromeDriver());
 }
 
 public void GoogleYourself(String name, Driver driver){
    driver.get("http://google.com");
     
    WebElement searchText = driver.findElement(By.name("q"));
    searchText.sendKeys(name);
    searchText.submit();    

    driver.quit();
 }

It consists of three different methods, with the first a trivial one to kick things off. In that method, I call GoogleYourselfAcrossBrowsers and supply a pretty generic name. GoogleYourselfAcrossBrowsers then invokes GoogleYourself using three different drivers that it instantiates.

For its part, GoogleYourself does exactly that. And it allows you to see a bit of the API in action. Retrieve the homepage of google and then locate the "q" element (for query). Then send the keystrokes for the name to that element and perform a submit, before quitting.

Taken as a whole, this code will Google the name in question using Firefox, Internet Explorer, and Chrome in sequence. Now, imagine a future in which some new browser comes out. You could add it to your test strategy by adding the appropriate driver dependency and then adding a single line of code.

The code above is slightly abstracted; it uses shorthand for the new() methods to be readable. In practice, the idea of passing in a browser and executing on what is passed in, or giving the name of the browser from the command-line, or any other method enables substantial reuse.

Code in Your Native Tongue

Traditional test automation has testers writing code in some ‘scripting’ language, perhaps created by the vendor. Programmers write their production code in C#, Ruby, Python, Java, or some other language and never see or touch the test code. Often the test code is tracked in some other system.

Writing the test code in the same language as the production code (or at least a language the production programmers are fluent in) makes it possible for the production programmers to take some ownership of the test code. Placing it in the same version control system and putting the tests under continuous integration can tighten the feedback loop from weeks to find a regression defect to hours. Some teams even have maintenance for the tests that fail due to changes as a responsibility of the production programmers. Thus “the story isn’t done until all the tests run.”

Rethinking Your Test Strategy

If you've lived without this sort of test automation, hopefully, you're starting to understand its power. You might have tests at the unit level, and perhaps you're got an automated system and integration tests as well. But, without Selenium WebDriver, there's a good chance you're not automating at the top of the testing pyramid.

If you aren’t experiencing this power then you’re probably doing this manually. And, in the world of web applications and multiple versions of multiple browsers across multiple devices, you're probably doing a lot of highly repetitive testing. Or, if you're not, then you're probably taking an awful lot on faith.

With Selenium WebDriver, you can really address a serious blind spot in your testing strategy. Handing QA a gigantic binder full of repetitive test cases and asking them to verify with every release creates mind-numbing work better spent on other things. It's a bad investment that finds shocking few defects for the money.

Consider leveraging Selenium WebDriver to automate this part of your testing approach, particularly since it lends itself quite well to automation. Free QA up to focus more on exploratory testing and other approaches that require more human judgment.

Selenium WebDriver Use Cases

Let's now get a bit more concrete about actual use cases for Selenium WebDriver. Hopefully, you understand now that it can give your testing strategy a makeover, while making it much more comprehensive. But let's look, in general, at what you can do with a web automation framework.

  • Reuse GUI testing across a multitude of browsers.
  • Perform core regression testing several times a day.
  • Generate a very visible display of functional or acceptance testing for your application. Stakeholders can actually watch the tests run.
  • Improve coordination between developers and testers.
  • Create system demos as a no-cost byproduct of the test process, that can also serve as documentation for “expected behavior.”

As you can see, beefing up your testing strategy features most prominently. This form of automation can help with regression, functional, and acceptance testing. But you can also realize a couple of additional perks. In order to get good at automating manipulation of the GUI, you need to develop an in-depth understanding of the GUI's elements. Testers and those performing testing as an activity will understand what is happening “under the hood”, for example how APIs and Javascript interact. And, you can use the automation to demo functionality both to a product owner and to end-users or other stakeholders.

You can probably come up with additional value in your unique situation that I haven't listed here. This is a powerful tool.

Improving Your Web Applications

Overall, this adds up to making improving your web applications and making your shop better at what it does. We've come a long way since the days of Internet Explorer 6 dominating the market. Browsers have proliferated and people have more choices than ever, while having higher expectations than ever before. You can differentiate yourself by keeping up with that proliferation while delivering high-quality web applications. But you can't do that without help. You need tools in your tool belt, and Selenium WebDriver is one of the most powerful ones out there.

Written by

Erik Dietrich

Topics

SeleniumCross browserProgramming languagesPerformance TestingAutomated testing