About the Blog

Its about anything and everything. I, Steven Hancock started this blog for a variety of reasons. I want to start documenting my life and sharing that with others, whether that's family, friends, strangers or my future self. I also want to start sharing my experiences with others in hopes that others can learn from me. Perhaps I can help someone set up an Ubuntu server, write a Django Web Application, or setup a Phonegap Mobile App.

That's it. I'm hear to share. Nothing more, nothing less. I will be covering a wide variety of topics so feel free to browse for the blog entries that interest you most.

GitHub Vanity Metrics

April 4, 2015

My day job is developing and understanding performance indicators from software to business metrics. We are trying to create a company culture that relies on measuring performance. One of our greatest difficulties is getting coworkers (specifically developers) to occasionally look at the different metrics that we have available to them. Most only care about the metrics that direct relate to them. Who cares what the companies churn rate is? Well everyone should, but the reality is that many developers don't. Part of my job is to show developers (and everyone else) how the performance indicators relate to them, even if it is not a direct relation. I believe that if the developers get hooked on one metric, I'll be able to shift them over to business metrics that may not directly relate to individual developers but have a huge impact on the company as a whole.

Developer KPI

Introducing GitHub Vanity Analytics. Basically this simple script allows the user to search GitHub repositories for files that have been modified, and track who has made those modifications. So for instance, maybe I want find out all the devs who modified JS files, and the number of modifications they made. Simple. Or maybe we should measure a behaviour that we want to encourage, like creating unit tests. We could track all of the commits that made changes to files that have test in the name (or in a test directory). Use this simple command:

python github_vanity_analytics.py -u randomscribes -p mypwd -s "test\.py" -r www-stevenhancock --start_path "tests"

Remember developers are smart people, they will easily be able to game this system. If they know they are being measured on changes to test files, they are going to make as many changes as possible (adding lines, renaming variables, etc). If we implement this strategy we are going to have to take precautions to ensure that Developers realize these are vanity metrics. They aren't actually measuring performance.

Anyway here's the code, make sure you watch the GitHub repo as I am sure I will be updating the code.

'''
Python Command Line Script
'''
import datetime
import logging
import pprint
import re
import requests

GIT_HUB_HOST = "https://api.github.com"


def list_github_repositories(username=None, organization=None, auth=None, **kwargs):
    """
    Get repositories owned by a user or an organization.
        - One of user or organization must be specified
    :param username: 
    :param organization: 
    :param auth:  (user, password)
    :param kwargs:  Arguments to be passed to the api call
        https://developer.github.com/v3/repos/#list-your-repositories
    :return:  results   
    """
    if username:
        url_root = "{host}/users/{username}".format(host=GIT_HUB_HOST, username=username)
    elif not username and organization:
        url_root = "{host}/orgs/{organization}".format(host=GIT_HUB_HOST, organization=organization)
    else:
        raise ValueError("One (and only one) of username or organization parameters "
                         "must be specified")

    url = "{url_root}/repos".format(url_root=url_root)
    response = requests.get(url, params=kwargs, auth=auth)
    return response.json()


class GitHubVanityAnalytics(object):
    """
    Object used to interact with GitHub
    """
    def __init__(self, owner, repo, user, pwd):
        """
        Create the object and set the preferences
        :param owner: 
        :param repo: 
        :param user: 
        :param pwd: 
        """
        self.owner = owner
        self.repo = repo
        self.auth = (user, pwd)

    def get_contents(self, path, ref=None):
        """
        Get the contents of a path.
        :param path:  Path to list contents of
        :param ref:  The name of the commit/branch/tag. Default: repository's default branch
        :return:  results 
        """
        url = "{host}/repos/{owner}/{repo}/contents/{path}".format(
            host=GIT_HUB_HOST, owner=self.owner, repo=self.repo, path=path)
        response = requests.get(url, params={"ref": ref}, auth=self.auth)
        return response.json()

    def get_rate_limits(self):
        """
        Get the contents of a path.
        :param path:  Path to list contents of
        :param ref:  The name of the commit/branch/tag. Default: repository's default branch
        :return:  results 
        """
        response = requests.get("{host}/rate_limit".format(host=GIT_HUB_HOST), auth=self.auth)
        return response.json()

    # pylint: disable=R0913
    # R0913: Too many arguments
    def search_for_file(self, regex, path, recursive=False, ref=None, file_type=None):
        """
        Search for file.
        :param regex:  Regex to search for
        :param path:  Path to list contents of
        :param recursive:  Recursively search through directories
        :param ref:  The name of the commit/branch/tag. Default: repository's default branch
        :param type:  file or dir. If None, searches for both
        :return:  Of files matching regex. 
        """
        matching_files = []
        contents = self.get_contents(path, ref)
        for listing in contents:
            if re.search(regex, listing["name"]) and (
                    file_type == None or listing["type"] == file_type):
                matching_files.append(listing)
            if recursive and listing["type"] == "dir":
                matching_files += self.search_for_file(
                    regex, listing["path"], recursive, ref, file_type)
        return matching_files

    def get_commits(self, **kwargs):
        """
        Get commits made to a repository.
        :param kwargs:  Arguments to be passed to the api call
            See https://developer.github.com/v3/repos/commits/#list-commits-on-a-repository
        :return:  results   
        """
        url = "{host}/repos/{owner}/{repo}/commits".format(
            host=GIT_HUB_HOST, owner=self.owner, repo=self.repo)
        response = requests.get(url, params=kwargs, auth=self.auth)
        return response.json()

    def get_commit(self, sha, **kwargs):
        """
        Get commits made to a repository.
        :param sha: 
        :param kwargs:  Arguments to be passed to the api call
            See https://developer.github.com/v3/repos/commits/
        :return:  results   
        """
        url = "{host}/repos/{owner}/{repo}/commits/{sha}".format(
            host=GIT_HUB_HOST, owner=self.owner, repo=self.repo, sha=sha)
        response = requests.get(url, params=kwargs, auth=self.auth)
        return response.json()

    def get_commit_stats(self, sha, stats_to_add=None, **kwargs):
        """
        Get commit stats (# of changes, additions, & deletions) grouped by owner
        :param sha: 
        :param stats_to_add:  Stats that should be added to
        :param kwargs:  Arguments to be passed to the api call
            See https://developer.github.com/v3/repos/commits/
        :return:  keys are: files, additions, deletions, changes, committer
        """
        response = self.get_commit(sha, **kwargs)
        committer = None
        if not stats_to_add:
            stats_to_add = {}
        
        if response:
            if "author" in response and response["author"]:
                if "login" in response["author"]:
                    committer = response["author"]["login"]

        if committer:
            if not committer in stats_to_add:
                stats_to_add[committer] = {
                    "files": 0,
                    "additions": 0,
                    "deletions": 0,
                    "changes": 0,
                    "commits": 0,
                } 
            print stats_to_add

            stats_to_add[committer]["commits"] += 1

            for gfile in response["files"]:
                stats_to_add[committer]["files"] += 1
                stats_to_add[committer]["additions"] += gfile["additions"]
                stats_to_add[committer]["changes"] += gfile["changes"]
                stats_to_add[committer]["deletions"] += gfile["deletions"]
        else:
            logging.debug("No committer or login found for sha: %s -- kwargs: %s",
                          sha, kwargs)

        return stats_to_add

def main():
    """
    Main program
    """
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--user', type=str, required=True)
    parser.add_argument('-p', '--password', type=str, required=True)
    parser.add_argument('-s', '--search_regex', type=str, required=True)

    parser.add_argument('-r', '--repositories', nargs='+', type=str)
    parser.add_argument('-o', '--organization', type=str)

    parsed_args = parser.parse_args()
    user = getattr(parsed_args, "user")
    password = getattr(parsed_args, "password")
    regex = getattr(parsed_args, "search_regex")
    repositories_to_search = getattr(parsed_args, "repositories")
    organization = getattr(parsed_args, "organization")

    stats = {}
    if organization:
        repos = list_github_repositories(organization=organization, auth=(user, password))
        repo_owner = organization
    else:
        repos = list_github_repositories(username=user, auth=(user, password))
        repo_owner = user

    git_repo = None

    for repo in repos:
        # skip the repos that we don't want to search
        if repositories_to_search and repo["name"] not in repositories_to_search:
            continue
        # go through each commit find what we are looking for
        git_repo = GitHubVanityAnalytics(repo_owner, repo["name"], user, password)

        # find each file that we care about
        found_files = git_repo.search_for_file(
            regex=regex, path="", recursive=True, file_type="file")
        for file_found in found_files:
            commits = git_repo.get_commits(
                path=file_found['path'],
                since=datetime.datetime(2015, 3, 20).isoformat(),
                until=datetime.datetime(2015, 5, 1).isoformat()
            )

            for commit in commits:
                stats = git_repo.get_commit_stats(
                    commit["sha"], stats, path=file_found['path'])
    
    pprint.pprint(stats)

    if git_repo:
        pprint.pprint(git_repo.get_rate_limits())
    


if __name__ == "__main__":
    main()