Scraping Google Trends with Selenium and Python

I first became interested in Google trends while searching for data to bolster a machine learning model I was developing to predict cryptocurrency market movements. I happened upon this article that suggests that there’s an impressively high correlation between the price of a cryptocurrency and the volume of related searches on Google.

Some of the normalized plots you can find are pretty compelling. These are taken from the article above:

Screenshot_1Screenshot_2


With my interest peaked, I set out to add Google trends data to my data ingestion pipeline. It turns out that scraping Google trends these days is somewhat tricky. A few years ago you could simply query Google trends with REST-style URL parameters to automatically download a CSV with the data you’re interested in, but unfortunately those days are over. No worries though, Selenium is here to help out!

Step 1: Install Selenium and Chrome Webdriver

Getting Selenium set up is a simple two step process:

  1. Install Selenium with pip install selenium
  2. Download the Chrome web driver here, and place the webdriver in your working folder.

Step 2: Put your Coding Gloves on

Screenshot_3

Nice. You’re good to go.

Step 3: Copy and Paste my Code

For this example, we’re going to look at the search volume for Bitcoin for the last year. To get some context, lets check the Google trends page out:

Screenshot_5

Basically, we want to instruct Selenium to click in the ‘CSV’ button (boxed in red), and then save the resulting CSV to a download folder of our preference. Here’s the corresponding HTML that we’ll be targeting:

Screenshot_6

And here’s the initial setup:


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

# The location of your webdriver:
webdriver_path = './chromedriver.exe'

# The full URL of the Google trends page you want to download from:
url = 'https://trends.google.com/trends/explore?q=bitcoin'

# The directory that you want to save the CSV to
download_path = './data'

# Tell Selenium where to save downloaded files
chrome_options = Options()
download_prefs = {'download.default_directory' : download_path,
                  'download.prompt_for_download' : False,
                  'profile.default_content_settings.popups' : 0}
chrome_options.add_experimental_option('prefs', download_prefs)

# Tell Selenium to operate without a browser window
chrome_options.add_argument('--headless')
chrome_options.add_argument('--window-size=1920x1080')

I wish that it was this simple, but unfortunately there’s a bit of a hack that we need to implement in order to get downloads working with the headless browser. It turns out that chrome has a default “feature” (not a bug!) that prevents a user from downloading files in headless mode. Selenium, unfortunately, doesn’t have sufficient access to reliably change this setting, so we need to hook into the Chrome driver itself. If this solution doesn’t work for you or if you’d like to know more about how this works, see this thread.


def enable_headless_download(browser, download_path):
    # Add missing support for chrome "send_command"  to selenium webdriver
    browser.command_executor._commands["send_command"] = \
        ("POST", '/session/$sessionId/chromium/send_command')

    params = {'cmd': 'Page.setDownloadBehavior',
              'params': {'behavior': 'allow', 'downloadPath': download_path}}
    browser.execute("send_command", params)
# Start up the browser
browser = webdriver.Chrome(executable_path=webdriver_path,
                           chrome_options=chrome_options)

# Use the hack above to enable headless download
enable_headless_download(browser, download_path)

# Load webpage
browser.get(url)

# Wait for 5 seconds to ensure that the webpage loads completely
time.sleep(5)

# Tell selenium to click the 'CSV' button
button = browser.find_element_by_css_selector('.widget-actions-item.export')
button.click()

# Sleep another 5 seconds to let the file download
time.sleep(5)

# Safely close the browser
browser.quit()

When the script finishes running you should have a nicely formatted CSV file in download_path. Nice! Feel free to take your gloves off now.

 

Here’s the complete script:


from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

def enable_headless_download(browser, download_path):
    # Add missing support for chrome "send_command" to selenium webdriver
    browser.command_executor._commands["send_command"] = \
        ("POST", '/session/$sessionId/chromium/send_command')

    params = {'cmd': 'Page.setDownloadBehavior',
              'params': {'behavior': 'allow', 'downloadPath': download_path}}
     browser.execute("send_command", params)

# Set paths
webdriver_path = './chromedriver.exe'
url = 'https://trends.google.com/trends/explore?q=bitcoin'
download_path = './data'

# Add arguments telling Selenium to not actually open a window
chrome_options = Options()
download_prefs = {'download.default_directory' : download_path,
                  'download.prompt_for_download' : False,
                  'profile.default_content_settings.popups' : 0}

chrome_options.add_experimental_option('prefs', download_prefs)
chrome_options.add_argument('--headless')
chrome_options.add_argument('--window-size=1920x1080')

# Start up browser
browser = webdriver.Chrome(executable_path=webdriver_path,
 chrome_options=chrome_options)

enable_headless_download(browser, download_path)

# Load webpage
browser.get(url)
time.sleep(5)

button = browser.find_element_by_css_selector('.widget-actions-item.export')
button.click()
time.sleep(5)
browser.quit()

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s