The Evolution of The WebDriver Protocol

appium group laptops

Christian Bromann summarizes the decisions made at the recent TPAC meeting, where the W3C working groups meet face to face to discuss web technologies, how to standardize them, and how to improve the interoperability of the web across all vendors in the industry.

Last month, September 16-20 2019, many people from various tech companies traveled to Fukuoka, Japan, to attend TPAC (the Technical Plenary and Advisory Committee meetings). It is an annual event of the World Wide Web Consortium (W3C). During the week, participants from all the W3C working groups meet face to face to discuss web technologies, how to standardize them, and how to improve the interoperability of the web across all vendors in the industry. Sauce Labs, as a member of the W3C organisation, sent four delegates, Marcus Merrell (@mmerrell), Titus Fortner (@titusfortner), Diego Molina (@diegofmolina) and myself (@bromann), to participate in the discussions around the WebDriver specification as part of the Browser Testing and Tools Working Group:

The WebDriver specification is a remote control interface that enables introspection and control of user agents. It provides a platform- and language-neutral wire protocol as a way for out-of-process programs to remotely instruct the behavior of web browsers.

Sauce Labs has a keen interest in contributing to the design and development of the protocol as it drives all functional testing on its cloud-based platform. The interface that is defined by the specification is implemented in various popular automation tools and frameworks such as Selenium and WebdriverIO. Improving WebDriver’s functionality and usability directly impacts not only Sauce customers but also every user of these Open Source tools. Other delegates participating in this year’s discussion include browser vendors such as Google, Mozilla, Apple, and Microsoft, as well as companies like Bocoup and Salesforce.

In this post, I will recap the discussions that happened over the course of the event and report on the outcomes and consensus we found in our working group. It is important to note that every proposed change to the protocol is just that: a proposal. The spec needs to be updated and the implementations written, and so details may change over the course of the next few months.

Pin Scripts to WebDriver Sessions

As a maintainer of WebdriverIO, I have experienced a lot of situations where people struggle to automate web applications built by frameworks such as React, Angular or Vue.js. While web developers see the application built out of individual self-contained components, automation engineers typically end up with generated HTML trees that sometimes make querying elements difficult. For example, some React apps generate random CSS classes for elements to encapsulate CSS declarations on component level. An automation engineer wouldn’t be able to use these as they change every time the app is built. The only option is to use complicated XPath locators or to add custom properties or IDs used only for the purpose of simplifying automation.

To solve this problem I implemented functionality in WebdriverIO that would run a script in the browsing context which queries components from the Virtual DOM Tree. Test Cafe also has a similar feature. Given that the protocol is already able to serialize DOM nodes into element references, this provides a custom way to fetch components based on its name and properties. The downside, though, is that it is an expensive operation, especially when running tests in the cloud, as we have to send over the script to fetch the element every time we call the command. 

Additionally, there are common building blocks that are no longer included in the specification and therefore are no longer required to be implemented by the drivers. For example, both getAttribute and isVisible must now be sent by the client as compressed JS at considerable cost (e.g. isVisible is a 44kb file). Having the ability to send these scripts once and “pin” them in the remote end can drastically reduce the overhead and latency in Sauce Labs user execution.

The solution is  to introduce new endpoints to pin and retrieve scripts for a specific session:

POST /session/{session id}/execute/pin

Once a script is pinned with a specific name, it can be called any time without sending a large script over the wire all over again. To execute a pinned script you will be able to send over a name property to the existing execute script command. If the name matches one of the pinned scripts it will run it. The overall idea is to collaborate on various scripts that will help us to automate arbitrary actions on the page using JavaScript, e.g. fetching React components. These can be open sourced and might find adoption in tools like Selenium. You can find more information on this proposal here.

Asynchronous Session Creation

Currently, if a Sauce Labs user attempts to start more sessions than they have available in their contract, Sauce Labs uses a network hack to keep these alive for up to ten minutes in the hopes that a session becomes available in order to satisfy the initial request. Customer Success Managers track the amount of extra requests with the throttling metric. Common practice is to tell users that this is a “queuing” feature, when at best it is a buffer. A common feature request from our clients is that Sauce Labs implement a true queuing system for their test suites. Every time we start a session with WebDriver we expect that a server—e.g., a browser driver, a Selenium Grid, or a cloud vendor like Sauce Labs—can immediately provide a session with the requested capabilities. However, sometimes it doesn’t and the session request can take a long time, causing WebDriver bindings to throw an error. This is often related to bad network conditions, missing capabilities, and limited concurrency. A proposal, which was facilitated by Simon Stewart, lead on the Selenium project, now offers to create a session in an asynchronous fashion, allowing the client to ask for the “readiness” of the session by using a new endpoint:

POST /session/async

Client implementations like Selenium will most likely switch over to always run an asynchronous session request if supported by the server. It returns a token ID that can then be used to check the status of the session creation process using another new endpoint:

GET /session/async/{session creation job id}/{token ID}

As a cloud provider we can then return detailed information about the status of the session including information on, for example, who is blocking the session creation by allocating to many resources. This solves for a common problem experienced by users that have a limited number of concurrent sessions.

Aria & WebDriver

Our friends at Bocoup, a technical consultancy specializing in web platform interoperability, joined us for an afternoon to discuss ways to promote accessibility by incentivizing web developers with tools that simplify testing accessible applications. They proposed the idea of extending WebDriver with new commands which interact through the accessibility tree. It would result in web developers having to write less test code, improved test stability, and resiliency and an automated verification of the accessibility of the application. For example, a new command to select a radio input element by its accessibility name could contain checks for proper accessibility properties of the radio element that would fail if not provided by the application. Given that WebDriver operates at quite a low level, these commands would abstract element interaction by leveraging the accessibility tree. You can read more about this proposal here. There is already a Selenium extension that implements this behavior called Ariadriver that the team has put together. The idea of connecting accessibility capabilities into WebDriver found interest in the working group and we are now waiting for concrete proposals.

Bidirectional Communication

This was probably the biggest topic that was discussed in this year’s meetings: changing the WebDriver protocol to support bidirectional communication similar to native browser interfaces like Chrome DevTools or Firefox Remote Debug Protocol. There are a lot of advantages to be realized from such a change. Generally it would help frameworks to improve the stability of the automation process by being able to listen to events coming from the browser. This has become very popular within tools like Puppeteer, where users can listen on specific page, DOM or network events to happen for a specific command to call. One of the reasons why the log command was not been specified within the WebDriver protocol was that it would have been very difficult to standardize it, given the request/response communication model it supports right now. A bidirectional communication can simplify this as users can just listen to various actors to receive arbitrary information. There are also various requirements, e.g., mocking and stubbing of browser requests, that WebDriver is currently not offering, that are currently very difficult to implement in terms of a command-response protocol.

The discussions that were held during the meetings were sometimes heated but very rational as browser vendors have different ideas and opinions of this new communication model. At the end of the day a general consensus was that this model could be based on JSON-RPC, similar to how it is currently implemented internally in Safari. Furthermore, the currently defined WebDriver commands can then be mapped based on such a model so that users are still able to leverage the simplicity of common automation actions like finding elements by a specific strategy or clicking on them. At some day frameworks will be able to upgrade to such a bidirectional connection, depending on whether the server supports it or not.

Conclusion

This has been the second time that I was able to attend TPAC and the session of the Browser Testing and Tools Working Group. I am very happy with the progress we made and that we found some general consensus on the very big topic of bidirectional communication. This change to the protocol is the next major step toward a powerful cross browser automation standard that drives and ensures interoperability on the web platform. Especially from the Sauce Labs perspective the proposals we discussed, particularly the ones on pinning scripts and asynchronous session creation, are going to impact all our customers in a very positive way. Now the actual work needs to be done to get these ideas into your hands, which are:

  1. Finalise the proposals made in the face-to-face meetings

  2. Find overall consensus across all stakeholders of the protocol (this also includes you!)

  3. Add tests to the web-platform-tests suite to ensure interoperability across all browser

  4. Support browser vendors with the implementation of the proposals

  5. Enable capabilities in automation frameworks such as Selenium or WebdriverIO

  6. Help users to leverage the new features and use them efficiently

As you can see, the process of getting various changes into a protocol like WebDriver requires a lot of work and takes time. But it is important to respect this process as it will lead to a solution that anyone can benefit from. If you are interested in the details of the discussion you can take a look into the meeting minutes which is a log of all conversations. You can also follow along the official WebDriver spec repository to get notifications for new proposals and suggestions. Everyone can participate in this process and help to improve the technology that drives test automation on the web.

Written by

Christian Bromann