Google BigQuery JavaScript and Node.js Client Usage

3 minute read

Github APIs provide a wide variety of data, such as events, repos, and users. I implemented a few charts for one authenticated user, specifically, to see your popular repos by fork_count, stargazers, watchers etc. The thing is, in spite of these per-repo-based activity data, it is not easy to get aggregated data for repos, it is either beyond the request limit or there is no historical data(for example, you can retrieve activity data within 90 days). After looking around on the Internet, the only feasible way seems to be using Github Archive Data, which is stored and updated in Google BigQuery.

You can activate Google BigQuery as mentioned here, once you get access to this dataset, you can query data similar to the way you use SQL(but not exactly the same), the you can download the result CSV file.

However, in my case, I need to query relevant data for each user, and hopefully update query within latest month dataset. CSV files won’t work then.

Fortunately, google provides APIs including BigQuery api. I should be able to get data similar to the way I query Github API.(I also kept the Github API in Express Router as well).

As you can see here, both JavaScript and Node.js client libraries are still in early-stage development. I intended to manage all data API on the server side by using node.js client library, but it didn’t work well at the time being, the good thing is the browser javascript library worked.

So I’d like to take notes of both libraries, hopefully I can unify the APIs either use Python or a stable Node.js client library.

Browser JavaScript Library

The official documentation is a good start. And just a reminder, we need to use OAuth2 for google apis as well.

Basically, what you need to do is to use the client.js to initialize the process, and authorize the request with some essential information such as client_id, project_id, and ‘scope’.

window.onload = function(){
        var googleObj = googleObj || {};
        (function($) {
            googleObj.bigQuery = {
                projectNumber: [project id],
                clientId:[client id],

                initAuth: function() {
                    var config = {
                        'client_id': googleObj.bigQuery.clientId,
                        'scope': 'https://www.googleapis.com/auth/bigquery'
                    };

                    gapi.auth.authorize(config, function() {
                        gapi.client.load('bigquery', 'v2').then(function(){
                            var request = gapi.client.bigquery.jobs.query({
                                'projectId': [project id],
                                'timeoutMs': '30000',
                                'query': 'SELECT * FROM [publicdata:samples.github_timeline] LIMIT 2;'
                            });

                            request.execute(function(response) {
                              // pass query result to d3
                                dataD3(response)
                            });
                        })
                    });

                    function dataD3(data){
                        console.log(data)
                    }
                }
            }
        })(jQuery);


        jQuery(document).ready(function($) {
            googleObj.bigQuery.initAuth();
        });
	}

Node.js Client Library

The node.js client library OAuth2 is similar to what was mentioned in previous posts. Users click on auth button, they will be redirected to a authorization_url which is generated by the library, the authorization_url is a consent page where users confirm the authentication, after that, page will be redirected to the callback url you specified in your app, with a code parameter. The next step is to use this code to exchange an token, save that token and use this token for each request.

var express = require('express');
var router = express.Router();
var fs = require('fs');
var path = require('path');
var qs = require('query-string');

/*
 * Create a consent page URL
 */
var google = require('googleapis');
var OAuth2 = google.auth.OAuth2;
var bigquery = google.bigquery('v2');


var CLIENT_ID = [CLIENT_ID]
var CLIENT_SECRET = [CLIENT_SECRET];
var REDIRECT_URL = 'http://localhost:3030/oauthcallback';

var oauth2Client = new OAuth2(CLIENT_ID, CLIENT_SECRET, REDIRECT_URL);

// set auth as a global default
google.options({ auth: oauth2Client });

// generate a url that asks permissions for Google+ and Google Calendar scopes

var authorization_url = oauth2Client.generateAuthUrl({
  access_type: 'offline', // 'online' (default) or 'offline' (gets refresh_token)
  scope: 'https://www.googleapis.com/auth/bigquery'
  // If you only need one scope you can pass it as string or pass it as an array
});

router.get('/', function (req, res) {
	res.render('index')
});

// Initial page redirecting
router.get('/auth', function (req, res) {
    res.redirect(authorization_url);
});

/*
 * Retrive authorization code
 * Retrive access token
 */
router.get('/oauthcallback', function(req, res){
  var code = req.query.code;
  console.log("code is "+code)

  oauth2Client.getToken(code, function(error, tokens) {
    // NOTE: Timeout error in this step.
    if (error) {
        console.log('Error while trying to retrieve access token', err);
        return;
    }
    var accessToken = tokens.access_token
    oauth2Client.setCredentials({
      access_token: accessToken
    });
  });

});

module.exports = router;

The node.js library is still in alpha stage, I am not sure if the timeout in exchange code for token has sth to do with it, or it is a bug on my end, it didn’t work as I expected though. So I will be using both Github API and browser javascript client library for githubarchive data.

  1. Querying data from BigQuery using Javascript;
  2. BigQuery Big Data Visualization With D3.js;
  3. Google API Node JS Authentication and Usage;