How to Create Your First Web Automation with Playwright in JavaScript

A high-level introduction to creating automations in the web browser with a JavaScript library called Playwright.

Austin Hess
JavaScript in Plain English

--

Intro

Each day, the world becomes more automated. From mundane daily tasks to complex manufacturing processes to the street lights that coordinate traffic in our largest cities — automation is all around us. While I may be slightly biased (working for a company focused on automation), I truly believe that professionals from all industries should seek to gain the basic technical skills required to automate time-saving tasks. Not only does it make you a more valuable asset to your employer, but new business opportunities abound in the world of automation.

This article serves as a high-level introduction to creating automation in the web browser with a JavaScript library called Playwright.

What Are We Going to Make?

We’re going to start simple here. The final product of this tutorial will be web automation that navigates to Yahoo! Finance and downloads a CSV file of the last year of historical data for a given stock ticker.

Set Up Your Project

First off, create a new directory:

mkdir first-web-automation
cd first-web-automation

Second, we need to initialize our project and install Playwright:

yarn init
(answer prompts)
yarn add playwright

If you’re not familiar with NPM scripts, they are scripts that you can define in your package (just the root folder which contains package.json) that make it easier to run, test, and debug your code. Let’s add a script there now for running our project. Add this snippet to your package.json in-between "license": ... and "dependencies": ...

"scripts": {
"start": "node main.js"
},

Any script defined in this scripts block can be run with:

yarn <script-name>

Let’s Write Some Code

Let’s start very basic and have Playwright open up a web browser. Create a file main.js and open it up in your editor. Let’s create this basic structure:

const { chromium } = require('playwright');void main().then(() => process.exit(0));async function main() {
let browser = await chromium.launch({ headless: false });
let page = await browser.newPage();
await page.waitForTimeout(5000);
}

Before I explain what this code does, save the file and then run our yarn start command. Try to understand what is happening. A browser window should appear for 5 seconds, close, and then your Node process should exit.

chromium.launch() launches an instance of a Chromium browser. Passing the option of headless: false forces the library to open a graphical window of the browser. Without it, the browser would run in the background. browser.newPage() actually tells the browser instance to open a new page. The page.waitForTimeout(5000) simply tells the page to wait for 5 seconds (provided in milliseconds) before the function ends and the process exits.

void main().then(() => process.exit(0)) is a way to run the asynchronous function defined below. When the Promise from main resolves, the process will exit with a successful exit code of 0.

Navigating in the Browser

Now let’s speed things up a bit. We can now start a browser and open a page all with a single command. The playwright also provides easy methods for navigating through the internet. Within the function main, remove the call to waitForTimeout and add the following code:

async function main() {
...
await page.goto('https://finance.yahoo.com');
}

Save your changes, and again run yarn start. You’ll now see that the browser navigates to the homepage of Yahoo! Finance before closing. Ultimately, we want to be able to pass in a stock ticker as an argument and to search for that ticker, but for now, let’s hardcode GOOG as the desired ticker to search. But how do we search for a ticker?

Finding the Best Path

You may be thinking: “Obviously we just need the automation to input text into the search box at the top of the page”. That isn’t necessarily incorrect but it is not optimal. Let’s observe the browser’s behavior when we search GOOG in the search bar.

When you press ‘Enter’ with GOOG as your search criteria, notice how the URL in the browser changes. The URL changes from https://finance.yahoo.com to https://finance.yahoo.com/quote/GOOG?p=GOOG&.tscrc=fin-srch. This tells us exactly how to directly navigate to a given ticker with just the URL and some query string parameters. Now I’ve verified with several other tickers that this is indeed the pattern, but it is a good practice to ensure you have a full grasp of a given website’s searching mechanism before you account for the pattern in your automation code. We can account for this situation by updating main() to the following:

async function main() {
let ticker = 'GOOG';
let browser = await chromium.launch({ headless: false });
let page = await browser.newPage();
let url = `https://finance.yahoo.com/quote/${ticker}?p=${ticker}&.tscrc=fin-srch`;
await page.goto(url);
await page.waitForTimeout(5000);
}

We add a waitForTimeout so we have 5 seconds to absorb the final state of the automation before it closes. Run your code again and see what happens. This time, it should immediately navigate to the GOOG webpage:

Pretty cool, huh? FWIW, Playwright is a library that was actually built for testing, but it tends to make for a great general automation tool as well.

Finding Historical Data

Next on the list is for us to navigate to the area of this page where we can download the historical data for GOOG from the past year. As you can see, there is a tab on this page called “Historical Data” — that is probably where we want to look.

Playwright has a click method, that allows us to specify a DOM element (a specific part of the webpage, in this case, a link/button) for the browser instance to click on. If we open this page up in our own browser and right-click on “Historical Data” and then click “Inspect”, the browser developer tools will open up to the right.

Let’s use a selector that first targets the li in which the anchor element resides. Try adding the below code to main():

async function main() {
...
let selector = 'li[data-test="HISTORICAL_DATA"] > a';
await page.click(selector);
await page.waitForTimeout(5000);
}

Again, I moved the waitForTimeout to the end of the function so we can process what all happened. Assume we’ll keep doing that every time we make an addition. As you can see, the click() method brings us to the historical data tab. But wait… can you think of an even easier way to get there? Think back to the way we navigated directly to the GOOG ticker page. Can we use query parameters again? Sure enough. Let’s get our main() function looking even better:

async function main() {
let ticker = 'GOOG';
let browser = await chromium.launch({ headless: false });
let page = await browser.newPage();
let url = `https://finance.yahoo.com/quote/${ticker}/history?p=${ticker}`;
await page.goto(url);
await page.waitForTimeout(5000);
}

We could have dropped our search text in the search bar, clicked the “Submit” button, and then clicked the “Historical Data” tab, but thanks to the way HTTP works, we can navigate directly to the historical data we were looking for.

Downloading Files with Playwright

Thankfully (and intentionally), Yahoo! defaults the historical data range to the past year, although changing those inputs is very possible with Playwright. As you can see, there is a “Download” button on the “Historical Data” page in the upper right corner of the main content area:

In order to download files using Playwright, we’ll need to listen for a “download” event before clicking the button. Let’s add the following code:

async function main() {
...
let [download] = Promise.all([
page.waitForEvent('download');
page.locator(`a[download="${ticker}.csv"]`);
]);
await download.saveAs('./data.csv');
}

After running this code, you should now see a file that has appeared in your project directory with all the historical data from GOOG for the past year. Voila!

Parameterizing the Automation

The final thing that I promised we’d do is to allow us to run this automation for any given ticker. A great way to allow for parameter passing for a simple script like this is the argparse library. It allows us to easily specify command line arguments and use them in our code. Let’s install that now by running:

yarn add argparse

Let’s add a single argument called ticker. The final code should look like the following:

const { chromium } = require('playwright');
const { ArgumentParser } = require('argparse');
let parser = new ArgumentParser();
parser.add_argument('--ticker', { type: String, required: true });
const { ticker } = parser.parse_args();void main()
.then(() => process.exit(0))
.catch((err) => {
console.error(err);
process.exit(1);
});
async function main() {
let browser = await chromium.launch({ headless: false });
let page = await browser.newPage();
let url = `https://finance.yahoo.com/quote/${ticker}/history?p=${ticker}`;
await page.goto(url);
let [download] = await Promise.all([
page.waitForEvent('download');
page.locator(`a[download="${ticker}.csv"]`).click();
]);
await download.saveAs('./data.csv');
await page.waitForTimeout(5000);
}

We can run our final product like this:

yarn start --ticker GOOG

Outro

I truly hope you enjoyed my first article and learned a thing or two. If anything was unclear or you got stuck, please reach out in the comments.

The following are some fun ways you could extend this example for more practice:

  • What is an even simpler method for downloading this file programmatically? Hint: HTTP
  • Allow for searching by a given date range
  • Sort the data you get and provide some meaningful output

Here is a link to the Playwright documentation. Shout out to the Playwright community for some top-notch docs.

More content at plainenglish.io. Sign up for our free weekly newsletter. Get exclusive access to writing opportunities and advice in our community Discord.

--

--

Software Engineer. Working in web automation with TypeScript. Interested in exploring Golang/microservices, machine learning, and engineering management.