Learn everything about our new Bitergia Research branch!

Data, data and data about your favourite community: GrimoireLib

Share this Post

Table of Contents

[This post is part of the lightning talk presented at FOSDEM 2015. The talk was titled as “Data, data and data about your favourite community” whose slides are available in the Bitergia’s Speakerdeck place. The ipython notebook used for visualization purposes is accesible through nbviewer and can be downloaded in GitHub. This is a basic introduction to GrimoireLib.]

GrimoireLib aims at providing a transparency layer between the database and the user. This helps to avoid the direct access to the databases while providing a list of available metrics.

This is a Python-based library and expects an already generated database coming from some of the Metrics Grimoire tools. CVSAnalY, MailingListStats, Bicho and most of the tools are already supported by this library.

The following piece of code imports the needed modules to start playing with GrimoireLib. Each of the metrics or studies are always instantiated in the same way: a database connection object and a set of predefined filters. In this example, SCMQuery is the module to access the database, MetricFilters module contains all of the necessary definition for conditions. And finally the source code module defined as scm.


[code language=”python”]
# Database access
from vizgrimoire.metrics.query_builder import SCMQuery
# Filters to apply
from vizgrimoire.metrics.metrics_filter import MetricFilters
# Let’s start playing with git activity metrics
import vizgrimoire.metrics.scm_metrics as scm

This part of the code is an example of the instantiation of a database access, where a predefined database is used. In this case, this is taken from the OpenStack activity board that is publicly available. As indicated in the options, there are two databases to be defined: the source code and the identities containers. Although in this example the same database is specified, at some point those two databases should be different and the identities and affiliations information will be separated from the rest of the schemas.

[code language=”python”]

# Instantiate database access
# Playing with OpenStack source code database (MySQL) at
# Database named as openstack_source_code_fosdem2015

user = "root"
password = ""
source_code_db = "openstack_source_code_fosdem2015"
identities_db = "openstack_source_code_fosdem2015"

dbcon = SCMQuery(user, password, source_code_db, identities_db)

Filters are specified in different ways. We need at least to define three parameters: the period of analysis (monthly, daily, weekly, etc), the initial, and the final date of analysis. On top of that, two extra filters are defined, one of them containing conditions to filter data by an organization. And the second one where information will be filtered by an organization and by a repository.

[code language=”python”]

# Instantiate some filters to play with
period = MetricFilters.PERIOD_MONTH
startdate = "’2014-01-01’"
enddate = "’2015-01-01’"

# basic filter
filters = MetricFilters(period, startdate, enddate)
# company filter
filters_company = MetricFilters(period, startdate, enddate)
filters_company.add_filter(MetricFilters.COMPANY, "Red Hat")
# company and repo filter
filters_repo_com = MetricFilters(period, startdate, enddate)
filters_repo_com.add_filter(MetricFilters.COMPANY, "Red Hat")
filters_repo_com.add_filter(MetricFilters.REPOSITORY, "nova.git")


So, let’s start! First, the metric API provides four methods:

  • get_agg: provides aggregated information. Eg: number of commits between two dates.
  • get_ts: provides a timeseries with date information. Eg: number of commits between two dates in a monthly basis.
  • get_trends: provides trends information. Eg: difference of number of authors between this year and the previous one.
  • get_list: provides a list of elements of the selected metric. Eg: top contributors for the last year.

[code language=”python”]
# Retrieving data for each filter.
# Let’s start with commits
commits = scm.Commits(dbcon, filters)

Thus, a simple way to visualize the total activity in the OpenStack Foundation in 2014 could be done using the following piece of code:

[code language=”python”]


In addition, it is possible to filter such data, to check activity only from a given organization. Let’s use Red Hat as a potential organization for this example.

[code language=”python”]
# Let’s use another filter
commits_redhat = scm.Commits(dbcon, filters_company)


Or we can go a step further and check activity for a given organization in a specific repository.

[code language=”python”]
# Let’s focus on an organization and a repository
commits_redhat_nova = scm.Commits(dbcon, filters_repo_com)


Although this post has focused only on commits, there are dozens of metrics and studies that can be used in the same way from several data sources: source code, issue tracking system, mailing lists, irc channels and others. More information is available at the GrimoireLib repository. If you are interested on specific training about these tools, just let us know 😉



More To Explore

woman smiling shaking hands
Open Source

Bitergia’s Insights into ASF Community Diversity and Inclusion

We proudly present and highlight a comprehensive report from the Apache Software Foundation (ASF) on diversity and inclusion within its community. The report resulted from a project between Bitergia, Oregon State University, and the ASF. We thank Google for sponsoring this research work.

Do You Want To Start
Your Metrics Journey?

drop us a line and Start with a Free Demo!