In this tutorial, I'm going to walk through how to use a python script scheduled with pub/sub on Google Cloud to update Google Sheets daily.

Using python to get the data

I'm going to use a public API to get daily counts for confirmed COVID-19 cases: https://covidtracking.com.

We're going to use the endpoint /api/v1/us/daily.json to get the total number of cases for yesterday. Using requests in python, we can get the data using the following code:

import requests
url = "https://covidtracking.com/api/v1/us/daily.json"
response = requests.get(url)

Our response will contain a list of JSON elements, each one representing one day's worth of data:

[
    {
    "date":20200427,
    "states":56,
    "positive":981134,
    "negative":4612361,
    "pending":4077,
    "hospitalizedCurrently":54971,
    "hospitalizedCumulative":97782,
	...
    "totalTestResultsIncrease":135505
    },
    ...
]

We can either grab response[0] or, if we want a specific date, loop through the elements while checking the "date" value against our desired date.

import datetime
yesterday = datetime.datetime.today() - datetime.timedelta(days=1)
yesterday_api = yesterday.strftime('%Y%m%d')
date formatting using datetime

The data point we're looking for is "positive":

import json

cases = str(response.json()[0]["positive"]

Interacting with Google Sheets using python

In order to store our data with Google Sheets, we'll use pygsheets.

pip install pygsheets
install pygsheets with pip

At this point, we'll also need to create an application using Google Developer Console. Once our application is created, we can enable the Google Sheets API:

Now that we've enabled the API, we need to create a key so that our application can access Google Sheets.

choose 'Service Account'

After choosing 'Service Account', you can download your keyfile in json format. Take note of the "client_email" field - you should share the sheet you want to edit with this email address.

Now our application has access to Google Sheets! We can interact with the sheet as follows:

  SERVICE_FILE="address_of_service_file"
  gc = pygsheets.authorize(service_file=SERVICE_FILE)
  sh = gc.open("sheet_name")
  wks = sh[0]
  df = wks.get_as_df()
read in Sheets data as dataframe
wks.set_dataframe(df,(1,1))
write data back to Sheets starting at index (1,1)

Scheduling on Google Cloud

Now that we've created a script that can read and write to/from Google Sheets, we want to schedule our script to run daily. For this, we'll use Google Functions along with Google Cloud Scheduler.

First, we'll create a Function from our script. Make sure to add a requirements.txt file and an entry point in main.py. Note that Google Functions expects main() to be called with 2 parameters - data and context - even if you don't need them for your script.

pygsheets
requirements.txt
if __name__ == '__main__':
    main(data, context)
main.py (this file contains your python script)

Before we create our function and pub/sub topic, make sure you have Google Cloud SDK installed (documentation here).

From within the directory holding your main.py and requirements.txt files, do the following:

gcloud functions deploy covid_data_update --entry-point main --runtime python37 --trigger-resource cron_topic --trigger-event google.pubsub.topic.publish --timeout 540s

We've just deployed a function called covid_data_update that runs with Python 3.7, starting at main.py.  Our function will be triggered by a pub/sub topic called cron_topic.

The last step is to create and schedule our pub/sub topic. Without this step, our function will never run - it will wait to be triggered by a publish which never comes!

gcloud scheduler jobs create pubsub data_update --schedule 10*** --topic cron_topic --message-body “updating data!”

This command creates a pub/sub job called data_update with topic "cron_topic". The job is scheduled to run daily at 12:01 AM (first minute, 0th hour, every day of the month, every month of the year, every day of the week).

And that's it!