Blog Logo

29-Nov-2024 ~ 4 min read

How to Setup Playwright with Flask on Google Cloud Run


When working with server-side rendering, developers often face the challenge of retrieving fully rendered HTML from modern websites. Many websites rely on JavaScript to dynamically load and update content, which makes traditional HTTP requests insufficient. In this tutorial, we’ll create an endpoint deployed to Google Cloud Run that leverages Playwright to render HTML, including all dynamically injected DOM nodes.

Overview of the Solution

We’ll build a Flask application that:

  1. Exposes an endpoint to accept a URL.
  2. Uses Playwright to fetch and render the URL.
  3. Waits for all network activity to settle before extracting the fully rendered HTML.
  4. Deploys the API on Google Cloud Run for scalable, serverless hosting.

Prerequisites

  1. Python 3.13 or later installed locally.
  2. Docker installed to containerize the application.
  3. A Google Cloud account with gcloud CLI configured.

Step 1: Setup the skeleton

Create a new directory for your project and touch the following files:

mkdir playwright-flask-cloud-run
cd playwright-flask-cloud-run
touch Dockerfile
touch requirements.txt
touch app.py
touch Makefile

Step 2: Writing the Dockerfile

To deploy on Google Cloud Run, we need to containerize the application. The Dockerfile installs all necessary dependencies, including Playwright.

Dockerfile:

# Use an official Python image
FROM python:3.13-slim

# Set the working directory
WORKDIR /app

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONBUFFERED=1
ENV PLAYWRIGHT_BROWSERS_PATH=/app/ms-playwright

# Install system dependencies
RUN apt-get clean && apt-get update

COPY . .

# Install Playwright and Flask etc. dependencies defined in requirements.txt
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# Install Playwright browsers and dependencies
RUN apt-get install -y gconf-service libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils
RUN python -m playwright install --with-deps chromium

# Copy the application files
COPY . .

# Expose port 8080 for Google Cloud Run
EXPOSE 8080

# Run the application
CMD ["gunicorn", "--bind", ":$PORT", "--workers", "1", "--threads", "8", "--timeout", "300", "main:app"]

Step 3: Writing the requirements.txt

We need to specify the dependencies for the Flask application and Playwright in requirements.txt:

playwright==1.49.0
flask==3.1.0
gunicorn==23.0.0

Step 4: Setting Up the Flask Application

Let’s start with the Flask API. The application will accept a URL in a POST request, use Playwright to render the URL, and return the fully rendered HTML.

app.py:

from flask import Flask, request, jsonify
from playwright.sync_api import sync_playwright
import os

app = Flask(__name__)


@app.route('/render', methods=['POST'])
def render_url():
    data = request.get_json()
    url = data.get("url")

    if not url:
        return jsonify({"error": "URL is required"}), 400

    try:
        with sync_playwright() as p:
            browser = p.chromium.launch(headless=True)
            context = browser.new_context()
            page = context.new_page()

            # Navigate to the URL and wait for network activity to stop
            page.goto(url, wait_until="load")
            page.wait_for_load_state("networkidle")

            # Get the rendered HTML
            html = page.content()
            browser.close()

            return jsonify({"html": html}), 200
    except Exception as e:
        return jsonify({"error": str(e)}), 500


if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0",  port=int(os.environ.get("PORT", 8080)))

Step 5: Building and Deploying the Application

Now that we have the Dockerfile and Flask application ready, we can build the Docker image and deploy it to Google Cloud Run.

build:
    docker build -t gcr.io/YOUR_PROJECT_ID/playwright-flask .

push:
    docker push gcr.io/YOUR_PROJECT_ID/playwright-flask

deploy:
    gcloud run deploy playwright-flask \
    --image gcr.io/YOUR_PROJECT_ID/playwright-flask \
    --platform managed \
    --region us-central1 \
    --allow-unauthenticated \
    --memory 2Gi \
    --timeout 300s


test:
    curl -X POST \
    -H "Content-Type: application/json" \
    -d '{"url": "https://example.com"}' \
    https://playwright-flask-XXXXXXXXXX-uc.a.run.app/render

Key Notes:

  1. Network Idle: The page.wait_for_load_state(“networkidle”) ensures that the page waits for all network activity to cease before retrieving the content.
  2. JSON Input/Output: The server expects a POST request with JSON input ({“url”: ”…”}) and returns the rendered HTML as JSON.
  3. Resource Limits: Rendering pages with Playwright is resource-intensive. Choose appropriate resource limits when configuring your Cloud Run service.
  4. Timeouts: Ensure your Cloud Run timeout settings accommodate the time Playwright needs to render large or complex pages.
  5. Error Handling: Add robust error handling to manage issues like invalid URLs, timeouts, or JavaScript errors.

Conclusion

And with that, you have successfully set up a Playwright server with Flask on Google Cloud Run. This solution provides a scalable and serverless way to render web pages with Playwright, making it ideal for server-side rendering, and other automation tasks.