Back to Resources


Posted June 15, 2015

Appium + Sauce Labs Bootcamp: Chapter 2, Touch Actions


This is the second in a series of posts that discuss using Appium with Sauce Labs. In the first chapter, we covered Language Bindings. This installment discusses Touch Actions; Chapter 3, Testing Hybrid Apps & Mobile Web; and Chapter 4 is about Advanced Desired Capabilities.

One aspect of mobile devices that needs to be automated in order to fully test applications, whether native, hybrid, or web, is utilizing gestures to interact with elements. In Appium this is done through the Touch Action and Multi Touch APIs. These two APIs come from an early draft of the WebDriver W3C Specification, and are an attempt to atomize the individual actions that make up complex actions. That is to say, it provides the building blocks for any particular gesture that might be of interest.

The specification has changed recently and the current implementation will be deprecated in favor of an implementation of the latest specification. That said, the following API will remain for some time within Appium, even as the new API is rapidly adopted in the server.

Touch Actions

The Touch Action API provides the basis of all gestures that can be automated in Appium. At its core is the ability to chain together _ad hoc_ individual actions, which will then be applied to an element in the application on the device. The basic actions that can be used are:

  • press

  • longPress

  • tap

  • moveTo

  • wait

  • release

  • cancel

  • perform

Of these, the last deserves special mention. The action perform actually sends the chain of actions to the server. Before calling perform, the client is simply recording the actions in a local data structure, but nothing is done to the application under test. Once perform is called, the actions are wrapped up in JSON and sent to the server where they are actually performed!

The simplest action is tap. It is the only one that cannot be chained with other actions, since it is a press and release put together. The rest of the actions are straightforward, and cover the sorts of touch screen interactions that one would expect. The beginning of most interactions is with either press or longPress, which can be performed on a point on the screen, an element, or an element with an offset from its top left corner. The only difference between the two methods is, as their names suggest, the length of time the gestures spends down.

After pressing, the gesture can include waiting and moving, to automate complex interactions. For instance, to simulate dragging and element onto another element, you might automate a longPress, moveTo, wait, and release. In Python, assuming you have a driver instance, this would look like

[code language="python"] source = driver.find_element_by_accessibility_id('Source') destination = driver.find_element_by_accessibility_id('Destination') action = TouchAction(driver) action.long_press(source).move_to(destination).wait(500).release() action.perform() [/code]

The wait function takes a time in milliseconds, which will be the minimum amount of time after the previous action that the subsequent action is performed. It is therefore useful for synchronization, as well as for actions, like the one above, that generally need some pause in order for the position to be registered by the application itself.

For documentation, see here, for the API in various languages see: Java, Ruby, Python, PHP, Perl, C#, and JavaScript.


A note on what the position arguments mean is in order. The most basic way to specify a position is to use an element. All the methods that deal with position (i.e., tap, press, longPress, and moveTo) can take an element as their point of action. Alone, this is interpreted as the center of the element. At the same time as the element, a point can be passed in, in the form of x and y. If both an element and a point are given to the method, the point is interpreted as an _offset_ from the top-left corner of the element.

The final possibility is a point alone. In the absence of an element, a point is taken literally, as the position on the screen, for all the “static” methods. That is, tap, press, and longPress. In the moveTo method, however, the point is interpreted as an _offset_ from the point from which it is a move. This leads to many conceptual errors, mostly indicated by either wildly erroneous moves, or out of bounds errors (the errors "The coordinates provided to an interactions operation are invalid" and "Coordinate [x=500.0, y=820.0] is outside of element rect: [0,0][480,800]" are common in this case).

Multi Touch Actions

Mobile applications, however, are not simply interacted with using a single gesture. Simple actions such as pinching and zooming require two fingers, and more complex interactions may take even more. In order to automate such actions Appium supports the Multi Touch API, which allows you to specify multiple Touch Action chains which will be run near-simultaneously.

If, for instance, you wanted to drag on element to the position of a second, while at the same time dragging the second to the position of the first, you would first build the individual actions, then add them to a multi action object:

[code language="python"] el1 = driver.find_element_by_accessibility_id('Element 1') el2 = driver.find_element_by_accessibility_id('Element 2')

action1 = TouchAction(driver) action1.long_press(el1).move_to(el2).wait(500).release()

action2 = TouchAction(driver) action2.long_press(el2).move_to(el1).wait(500).release()

multi = MultiAction(driver) multi.add(action1, action2) multi.perform() [/code]

You will notice that it is on the Multi Action object that we call perform here, rather than on the individual actions. As above, before this perform is called, nothing is sent to the server. The client just keeps track of the actions added, and when perform is called it packages up the information and sends it to the server for processing.

So, once you have the individual gestures working, getting complex multi-pointer gestures working is as simple as adding them to the MultiAction object, and sending them to the server with perform! Appium does the rest!

For language-specific documentation on the Multi Action API, see: Java, Python, PHP, C#, and JavaScript.

Full examples

Full examples can be found on Appium’s sample-code repository on GitHub.

Jun 15, 2015
Share this post
Copy Share Link
© 2023 Sauce Labs Inc., all rights reserved. SAUCE and SAUCE LABS are registered trademarks owned by Sauce Labs Inc. in the United States, EU, and may be registered in other jurisdictions.