Easy XPath Expressions

Easy XPath Expressions

Tangerine Peeled
XPath is a language that helps you navigate through XML (eXtensible Markup Language) documents. Website pages contain HTML, which is an XML like language. For automation purposes such as locating HTML elements you can use XPath. Many colleagues prefer to use id and this is quite OK as long as the id is present for the wanted element and it is not dynamic (it does not change). You can also locate element by several other options, including class, tag name, href or other suitable attribute. However there are many elements that could be uniquely identified only by XPath expression with multiple conditions. In this article we will start from basic expressions and we will finish with more advanced ones. The page that I used for testing the expressions below is located at https://qamag.net/wp-content/uploads/2017/12/XPath-Test-Page.html.

XPath Expressions

Find element by id expression if you want to match the exact id:

The following expression will match the div with id that is exactly primary:

//div[@id='primary']

You expect the id to be unique? Unfortunately this is not always true. Verify that only one element is matched.

Here is the expression if you want to match a part of the id:

The following expression will match the div with id that contains primary:

//div[contains(@id,'primary')]

Find element by exact class:

The same is valid for classes. The following expression will match all elements with class that is exactly site-main:

//*[@class='site-main']

The asterisk matches all elements, no matter what is the tag name (div, p, a, main, etc.).

You can locate element by partial class:

//div[contains(@class,'content-area')]

//div[@class='content-area'] will return 0 matched elements because the exact match will not select element with class col-md-9 content-area.

Find element by exact match of inner text:

//a[text()='Awesome Online Resources for Software Testing'] returns exactly one element.

Locate element by partial match of inner text:

//a[contains(text(),'Awesome Online Resources for Software Testing')] returns exactly one element.

Find element by attribute:

This expression finds 5 elements:

//a[contains(@href,'awesome-online-software-testing-resources')]

We could get the first of them by adding the element position in brackets:

(//a[contains(@href,'awesome-online-software-testing-resources')])[1]

Of course we can use exact match expression:

//a[@href='https://qamag.net/awesome-online-software-testing-resources/'] finds 4 matches.

Find element by tag name:

//h1 is unique for the test page, which is a good thing, more than 1 would mean we could have SEO issue.

Let’s start with more advanced expressions.

Translate

There is a nice translate option that you can use if there is a letter with different capitalization. The following expression finds all elements that contain text online, no matter of the letter  case.

//*[contains(translate(text(), 'ONLINE','online'),'online')] finds 3 matches in our test page.

Normalize space

//*[contains(concat(' ',normalize-space(@class),' '),' btn-default ')] finds  7 matches because it trims the white-space.

//*[contains(@class,' btn-default ')] returns 0 matches, because the white-space is not trimmed.

Starts With

//a[starts-with(@href,'https://qamag.net/awesome')] returns all anchors (5 matches) that have attribute href, starting with https://qamag.net/awesome.

Currently ends-with is not working for the expressions I have tested with, because of the XPath version. For more details, please check this stackoverflow answer.

Or

Or acts as union of the elements that are returned from each part of the expression.

//*[contains(@rel, 'bookmark') or contains(@id, 'main')]triggers 22 matches. The expression returns all elements that either have attribute rel that contains bookmark or have id that contains main.

//a/ancestor-or-self::article | //footer returns 14 matches. They are elements with tag article that are ancestors to anchor elements or elements with the tag footer.

And

The following expression returns all elements (1 match) that have attribute rel that contains bookmark and contains the text Awesome.

//*[contains(@rel, 'bookmark') and contains(text(), 'Awesome')]

The same is achieved with the expression:

//*[contains(@rel, 'bookmark')][contains(text(), 'Awesome')]

Not

The following expression find the element h1 if it does not contain class logo and does not contain class hidden.

//h1[not(contains(@class, 'logo')) and not(contains(@class, 'hidden'))]

The same is achieved with:

//h1[not(contains(@class, 'logo'))][not(contains(@class, 'hidden'))]

Every nth element

(//a[contains(@rel, 'bookmark')])[2]returns one element, whereas //a[contains(@rel, 'bookmark')][2] returns no elements. This is because [] has precedence over // operator. For more details please check this stackoverflow answer.

Find the parent element that has child element that meets certain conditions

Descendant

//h2/a returns anchor element that is direct child of h2, whereas //h2//a returns anchor element that is just child in h2, could be direct or indirect. It is the same as //h2/descendant-or-self::a

The following expression finds the anchor element that is descendant of h2 element and that contains the text Awesome Online Resources for Software Testing.

//h2//a[contains(text(), 'Awesome Online Resources for Software Testing')]

The following expression returns the h2 element that has descendant anchor element that contains the text between the single quotes:

//h2[descendant::a[contains(text(), 'Awesome Online Resources for Software Testing')]]also returns the same h2 element  in our test page as:

//h2[@class='entry-title'][.//descendant::a[contains(text(),'Awesome Online Resources for Software Testing')]]

Parent

//a/parent::article returns the article element that is a parent of an anchor element.

Both single and double quotes could be used in XPath expressions. I personally prefer single quote syntax, because in C# you use double quotes for strings and if you have double quotes in the expression you should escape them in the C# string.

Ancestor

//a/ancestor-or-self::articlereturns the article element that is an ancestor of an anchor element.

Sibling

The following returns div element with partial class entry-summary, which preceding sibling element is with header tag, with exact class entry header.

//div[contains(@class,'entry-summary')][.//preceding-sibling::header[@class='entry-header']]

//h2/following-sibling::div returns div element that is the following sibling of a h2 element.

Several conditions combinations

Try to predict what will return the following expressions, without looking at the answers a little bit below:

  1. //a[@rel='bookmark'][contains(text(),'Online')]
  2. //div[contains(@class,'content-area')]//a[contains(@href,'online') and contains(text(),'Awesome')]
  3. //header[.//descendant::a[contains(text(),'Highlights') or contains(text(),'Awesome')]]
  4. //article[(contains(@class,'type-post') and contains(@class,'tag-conferences')) or not(contains(@class,'category-resources'))]

Answers:

  1. Returns an anchor element that has attribute rel that equals bookmark AND contains text Online.
  2. The result is an anchor element that has parent div with partial class content-area AND contains partial href online and contains text Awesome.
  3. Returns header element that has descendant anchor element with either text Highlights or Awesome.
  4. As a result 7 article elements are returned that meet the three class predicates – two positive and one negative.

Test XPath Expressions

You can use browser developer console, browser plugin or online resources to test your XPath expressions.

Most of the time I test XPath expressions in the Chrome developer console. In order to do this, follow the steps:

  1. Open in Chrome the page used in this article for testing expressions https://qamag.net/wp-content/uploads/2017/12/XPath-Test-Page.html  or any other page you want to test with.
  2. Press F12.
  3. Ensure you are at Elements Tab (It is shown by Default).
  4. Press Ctrl+F.
  5. Enter or paste your expression in the input that appears, and if there is a match, the first element that is matched will be colored in yellow.

Test XPath Expression Chrome Dev Console

Relative XPath Helper and XPath Helper are useful and easy to use Chrome extensions for testing and generating of XPath expressions with simple mouse clicks.

You can also test online expressions on https://scrapinghub.github.io/xpath-playground/ or on https://www.freeformatter.com/xpath-tester.html.

Summary

XPath expressions add flexibility in locating elements. Usually there is more than one expression, that is readable, flexible and identifies uniquely an element. We need this in order to have stable and easy to maintain automation tests.