First commit or initial commit?

When I create a new .git repository, my first commit message tends to be “1st commit”. I’ve been wondering what other people use as initial commit message. Today I used the gh package to get first commits of all repositories of the ropensci and ropenscilabs organizations.

The sample might seem a bit small, but I just wanted to start exploring my question. I agree that it means my answer won’t be very conclusive.

Getting all repos for an organization

I’ve come up with a quite inelegant solution to paging, I just continue querying the API until it returns me nothing.

library("gh")
library("dplyr")
library("purrr")
get_repos <- function(org){
  ropensci_repos_names <- NULL
page <- 1
geht <- TRUE
while(geht){
  ropensci_repos <- try(gh("/orgs/:org/repos",
                  org = org,
                  page = page))
  
  geht <- ropensci_repos != ""
  
  if(geht){
    ropensci_repos_names <- c(ropensci_repos_names,
                              vapply(ropensci_repos, "[[", "", "name"))
    page <- page + 1
  }
}
  return(ropensci_repos_names)
}

head(get_repos(org = "ropenscilabs"))
## [1] "webmockr"      "vcr"           "seasl"         "plater"       
## [5] "rnaturalearth" "convertr"

Get first commit for a repository

Here I’m doing something quite inefficient. Since the API returns the most recent commits first I get all commits. I could have used the creation date of the repository instead to only query commits created shortly after that.

first_commit <- function(repo, org){
  messages <- NULL
  
  page <- 1
  geht <- TRUE
  while(geht){
    commits <- try(gh("/repos/:owner/:repo/commits",
                            owner = org,
                            repo = repo,
                            page = page))
    
    if(class(commits)[1] != "try-error"){
      geht <- commits != ""
    }else{
      geht <- FALSE
    }
    
    if(geht){
      now <- lapply(commits, "[[", "commit")
      now <- lapply(now, "[[", "message")
      messages <- c(messages, unlist(now))
      page <- page + 1
    }
  }
  
  messages[length(messages)]
}
first_commit("ropenaq", "ropensci")
## [1] "Everything"

I’m a bit surprised I chose “Everything” as first commit for my ropenaq package, actually. Not because I expect my commit history to be particularly smart either, just because it’s not a “1st commit”.

Get all the first commits

first_commits <- get_repos("ropenscilabs") %>%
  map(first_commit, org = "ropenscilabs") 
save(first_commits, file = "data/2017-02-21_ropenscilabs_first_commits.RData")
first_commits <- get_repos("ropensci") %>%
  map(first_commit, org = "ropensci") 
save(first_commits, file = "data/2017-02-21_ropensci_first_commits.RData")

What are the most frequent first commits?

load("data/2017-02-21_ropenscilabs_first_commits.RData")
ropenscilabs <- first_commits
load("data/2017-02-21_ropensci_first_commits.RData")
ropensci <- first_commits

all <- c(unlist(ropenscilabs),
         unlist(ropensci))
firstc <- tibble::tibble(commit = all)
firstc <- mutate(firstc, commit = tolower(commit))
firstc %>%
  group_by(commit) %>%
  summarize(n = n()) %>%
  arrange(desc(n)) %>%
  head(n = 15) %>%
  knitr::kable()
commit n
first commit 117
initial commit 76
added readme 19
added files 9
1st commit 3
create readme.md 3
init 3
added readme file 2
code extracted from mikabr/devtools 2
first comit 2
first commit, added files 2
initial 2
initial import 2
package infrastructure 2
rstudio new package project 2

Out of the 362 repositories, 76 used “initial commit” as a first commit message and 117 used “first commit” instead. In total 0.53 of all repos used either one of these two messages, which isn’t as much as I expected. But maybe rOpenSci repositories are unusual as regards first commit originality? And you, what is your favourite initial commit message if you have one?