How to Write a New Git Protocol
I used to have trouble keeping track of my files. I often couldn’t remember whether I saved a file on my desktop, laptop, or phone, or if it was floating around in the cloud somewhere. Plus, with certain information, like passwords and bitcoin keys, I didn’t feel comfortable just sending that in an email to myself in plain text.
What I wanted was to store my data in a git repository that was backed up to a single location. I could view old versions of files, and wouldn’t have to worry about my data being deleted. Plus, I was familiar with using git to push and fetch files to various computers.
But, like I said, I didn’t want to just upload my secret keys and passwords to GitHub or BitBucket, even in a private repository.
I had the cool idea of writing a tool to encrypt my repository before I pushed it into backup. Unfortunately, I wouldn’t be able to use git push
like I normally would, and instead would have to use something like this:
$ encrypted-git push http://example.com/
At least, that’s what I thought until I discovered git-remote-helpers.
Git remote helpers
Online, I found the documentation for git remote helpers.
It turns out that if you were to run the commands
$ git remote add origin asdf://example.com/repo
$ git push --all origin
Git would first check if it had the asdf
protocol built in, and when it saw it didn’t, it would check if git-remote-asdf
was on the PATH, and if it was, it’d run git-remote-asdf origin asdf://example.com/repo
to handle the communications.
Similarly, you can also run
$ git clone asdf::http://example.com/repo
Which will cause git to invoke git-remote-asdf origin http://example.com/repo
.
Unfortunately, I found the documentation to be severely lacking on the details I needed to actually implement a helper. But then, in the Git source code, I found a shell script called git-remote-testgit.sh that implements a testgit
which is used to test the git remote helper system. It basically implements pushing and fetching from local repositories on the same filesystem. So
git clone testgit::/existing-repository
is equivalent to
git clone /existing-repository
Similarly, you can push and fetch from local repositories over the testgit protocol.
In this article, we’ll walk through the code of git-remote-testgit and reimplement it in Go by creating a brand new helper, git-remote-go
. Along the way, I’ll explain what the code means, and the various things I had to learn in order to implement my own remote helper, git-remote-grave.
Some basics
To make the following sections clearer, let’s establish some terminology and basic mechanisms.
When we run
$ git remote add myremote go::http://example.com/repo
$ git push myremote master
Git will instantiate a new process by running the command
git-remote-go myremote http://example.com/repo
Notice that the first argument is the remote name, and the second argument is the url.
When you run
$ git clone go::http://example.com/repo
the helper will be instantiated with
git-remote-go origin http://example.com/repo
This is because the remote origin
is automatically created in cloned repositories.
When git instanties the helper as a new process, it opens up pipes for stdin, stdout, and stderr for communicating with it. Commands are sent to the helper over stdin, and the helper responds over stdout. Any output the helper produces on stderr is redirected to wherever git’s stderr is going—which is probably the terminal.
The following diagram demonstrates this relationship.
The last point I want to make before we begin is to distinguish the local and remote repository. Generally, but not always, the local repository is the one we are running git from, and the remote is the one we are making a connection to.
So in a push, we are sending changes from the local to the remote. In a fetch, we are taking changes from the remote to the local. In a clone, we are cloning from the remote into the local.
When git runs the helper, it sets the environment variable GIT_DIR
to the Git directory of the local repository (e.g. local/.git
).
Starting the project
In this article, I’m assuming that Go is installed, with $GOPATH
pointing to a directory named go
.
Let’s start by creating the directory go/src/git-remote-go
. This will make it possible to install our helper just by running go install
(assuming go/bin
is on the PATH).
With this in mind, we can write the first few lines of go/src/git-remote-go/main.go
.
package main
import (
"log"
"os"
)
func Main() (er error) {
if len(os.Args) < 3 {
return fmt.Errorf("Usage: git-remote-go remote-name url")
}
remoteName := os.Args[1]
url := os.Args[2]
}
func main() {
if err := Main(); err != nil {
log.Fatal(err)
}
}
I’ve separated Main()
as a separate function because error handling is easier when we can return errors. It also allows us to use defer
, since log.Fatal
calls os.Exit
, which doesn’t run defer
’d functions.
Now let’s look at the top of git-remote-testgit to see what to do next.
#!/bin/sh
# Copyright (c) 2012 Felipe Contreras
alias=$1
url=$2
dir="$GIT_DIR/testgit/$alias"
prefix="refs/testgit/$alias"
default_refspec="refs/heads/*:${prefix}/heads/*"
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
test -z "$refspec" && prefix="refs"
GIT_DIR="$url/.git"
export GIT_DIR
force=
mkdir -p "$dir"
if test -z "$GIT_REMOTE_TESTGIT_NO_MARKS"
then
gitmarks="$dir/git.marks"
testgitmarks="$dir/testgit.marks"
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
fi
The variable they call alias
is what we are calling remoteName
. url
means the same thing.
The next declaration is
dir="$GIT_DIR/testgit/$alias"
This creates a namespace in the Git directory that is specific to the testgit protocol and to the remote we are using. This way the testgit files for the origin
remote are different from the backup
remote.
Down below, we see the statement
mkdir -p "$dir"
This will make sure the local directory is created, if it doesn’t exist already.
Let’s add the creation of the local directory to our Go program.
// Add "path" to the import list
localdir := path.Join(os.Getenv("GIT_DIR"), "go", remoteName)
if err := os.MkdirAll(localdir, 0755); err != nil {
return err
}
Continuing through the script, we come across the following lines
prefix="refs/testgit/$alias"
default_refspec="refs/heads/*:${prefix}/heads/*"
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
test -z "$refspec" && prefix="refs"
Let’s talk about refs really quick.
In git, refs are stored in .git/refs
:
.git
└── refs
├── heads
│ └── master
├── remotes
│ ├── gravy
│ └── origin
│ └── master
└── tags
In the above tree, remotes/origin/master
contains the SHA-hash of the most recent commit in the master
branch of the origin
remote. heads/master
refers to the most recent commit of your local master
branch. A ref is like a pointer to a commit.
A refspec allows us to map remote refs to local refs. In the above code, prefix
is the directory where the remote refs will be held. If the remote name is origin, then the remote master branch would be determined by the ref .git/refs/testgit/origin/master
. It basically creates a protocol-specific namespace for remote branches.
The next line is the actual refspec. The line
default_refspec="refs/heads/*:${prefix}/heads/*"
expands to
default_refspec="refs/heads/*:refs/testgit/$alias/*"
Which means that map the remote branches that look like refs/heads/*
(where *
means any text) to refs/testgit/$alias/*
(where *
is replaced with whatever *
was in the first one). So refs/heads/master
becomes refs/testgit/origin/master
, for instance.
Essentially, the refspec allows testgit to add a branch to the tree for itself, like this
.git
└── refs
├── heads
│ └── master
├── remotes
│ └── origin
│ └── master
├── testgit
│ └── origin
│ └── master
└── tags
The next line
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
Sets $refspec
to $GIT_REMOTE_TESTGIT_REFSPEC
, unless it doesn’t exist, then it becomes $default_refspec
. This is so testgit can be tested with other refspecs. We’ll assume it gets set to $default_refspec
.
Finally, the next line,
test -z "$refspec" && prefix="refs"
Seems to set $prefix
to refs
if $GIT_REMOTE_TESTGIT_REFSPEC
exists but is empty, which we’ll assume is the case.
We need our own refspec, so we’ll add the line
refspec := fmt.Sprintf("refs/heads/*:refs/go/%s/*", remoteName)
Following that code, we see
GIT_DIR="$url/.git"
export GIT_DIR
Another fact about $GIT_DIR
is that if it is set in the environment, the git
binary will use the directory in $GIT_DIR
as its .git
directory, instead of the local .git
. This command makes it so that all future git commands run by the helper will run in the context of the remote repository.
We’ll translate this to
if err := os.Setenv("GIT_DIR", path.Join(url, ".git")); err != nil {
return err
}
Remember, of course, that $dir
and our variable localdir
still refer to a subdirectory of the repository we are fetching to or pushing from.
And the last bit of code before the main loop is
if test -z "$GIT_REMOTE_TESTGIT_NO_MARKS"
then
gitmarks="$dir/git.marks"
testgitmarks="$dir/testgit.marks"
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
fi
The contents of the if statement will be executed if $GIT_REMOTE_TESTGIT_NO_MARKS
isn’t set, which we’ll assume is the case.
These marks files are used by git fast-export
and git fast-import
to record information about refs and blobs being transferred. It’s important that these marks are kept the same between multiple invocations of the helper, so they’re being stored in the localdir.
Here, $gitmarks
refers to the marks for our local repository that git writes, while $testgitmarks
stores the marks for the remote one that the handler writes.
The two following lines appear equivalent to touch
invocations, where if the marks files don’t exist, they are created empty.
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
We’ll need these files in our own program, so let’s start by writing a Touch function.
// Create path as an empty file if it doesn't exist, otherwise do nothing.
// This works by opening a file in exclusive mode; if it already exists,
// an error will be returned rather than truncating it.
func Touch(path string) error {
file, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0666)
if os.IsExist(err) {
return nil
} else if err != nil {
return err
}
return file.Close()
}
Now we can create the marks files.
gitmarks := path.Join(localdir, "git.marks")
gomarks := path.Join(localdir, "go.marks")
if err := Touch(gitmarks); err != nil {
return err
}
if err := Touch(gomarks); err != nil {
return err
}
However, one thing I’ve come across is that if the helper fails for some reason, the marks files can be left in an invalid state. To guard against this, we can save the original contents of the files, and then rewrite them if the Main()
function returns an error.
// add "io/ioutil" to imports
originalGitmarks, err := ioutil.ReadFile(gitmarks)
if err != nil {
return err
}
originalGomarks, err := ioutil.ReadFile(gomarks)
if err != nil {
return err
}
defer func() {
if er != nil {
ioutil.WriteFile(gitmarks, originalGitmarks, 0666)
ioutil.WriteFile(gomarks, originalGomarks, 0666)
}
}()
We can finally begin on the central command loop.
Commands are passed to helper via stdin, where each command is a string terminated by a newline. The helper responds to the commands via stdout; stderr is piped to the end user.
Let’s make our own loop.
// Add "bufio" to import list.
stdinReader := bufio.NewReader(os.Stdin)
for {
// Note that command will include the trailing newline.
command, err := stdinReader.ReadString('\n')
if err != nil {
return err
}
switch {
case command == "capabilities\n":
// ...
case command == "\n":
return nil
default:
return fmt.Errorf("Received unknown command %q", command)
}
}
The capabilities
command
The first command to implement is capabilities
. The helper is expected to print what commands and other capabilities it supports on separate lines, terminated by an empty line.
echo 'import'
echo 'export'
test -n "$refspec" && echo "refspec $refspec"
if test -n "$gitmarks"
then
echo "*import-marks $gitmarks"
echo "*export-marks $gitmarks"
fi
test -n "$GIT_REMOTE_TESTGIT_SIGNED_TAGS" && echo "signed-tags"
test -n "$GIT_REMOTE_TESTGIT_NO_PRIVATE_UPDATE" && echo "no-private-update"
echo 'option'
echo
This list of capabilities states that this helper supports the import
, export
and option
commands. The option
command allows git to change the verbosity and such of our helper.
signed-tags
means that when git creates a fast-export stream for the export
command, it will pass --signed-tags=verbatim
to git-fast-export
.
no-private-update
instructs git to not update a private ref when it’s been successfully pushed. I’ve never seemed to need this feature.
refspec $refspec
tells git what refspec we want to use.
The *import-marks $gitmarks
and *export-marks $gitmarks
means git should save the marks it generates to the gitmarks
files. The *
means that if git does not understand these lines, it must fail instead of ignoring them. This is because the helper depends on the marks files being saved, and won’t work with versions of git that don’t support this.
Let’s ignore signed-tags
, no-private-update
and option
, as they are provided in git-remote-testgit for completeness of testing, and we don’t need them for this example. We can implement the above simply as
case command == "capabilities\n":
fmt.Printf("import\n")
fmt.Printf("export\n")
fmt.Printf("refspec %s\n", refspec)
fmt.Printf("*import-marks %s\n", gitmarks)
fmt.Printf("*export-marks %s\n", gitmarks)
fmt.Printf("\n")
The list
command
The next command is list
. This isn’t provided in the capabilities list because it must always be supported by the helper.
When the helper receives a list
command, it should print out the refs of the remote repository as a series of lines of the format $objectname $refname
, followed by an empty line. $refname
is the name of the ref, while $objectname
is what the ref points to. $objectname
can be a commit hash, refer to another ref by name with @$refname
, or be ?
, which means the ref’s value was unable to be acquired.
git-remote-testgit’s implementation is the following.
git for-each-ref --format='? %(refname)' 'refs/heads/'
head=$(git symbolic-ref HEAD)
echo "@$head HEAD"
echo
Remembering that $GIT_DIR
causes git for-each-ref
to run in the remote repository, this will print a line ? $refname
for every branch in the remote repository, as well as @$head HEAD
, where $head
is the name of the ref that the HEAD of the repository refers to.
In an ordinary repository with two branches, master and development, the output of this might look like
? refs/heads/master
? refs/heads/development
@refs/heads/master HEAD
<blank>
Now let’s write it ourselves. Let’s write a function GitListRefs()
, because we’ll need it again later.
// Add "os/exec" and "bytes" to the import list.
// Returns a map of refnames to objectnames.
func GitListRefs() (map[string]string, error) {
out, err := exec.Command(
"git", "for-each-ref", "--format=%(objectname) %(refname)",
"refs/heads/",
).Output()
if err != nil {
return nil, err
}
lines := bytes.Split(out, []byte{'\n'})
refs := make(map[string]string, len(lines))
for _, line := range lines {
fields := bytes.Split(line, []byte{' '})
if len(fields) < 2 {
break
}
refs[string(fields[1])] = string(fields[0])
}
return refs, nil
}
Now we’ll write GitSymbolicRef()
.
func GitSymbolicRef(name string) (string, error) {
out, err := exec.Command("git", "symbolic-ref", name).Output()
if err != nil {
return "", fmt.Errorf(
"GitSymbolicRef: git symbolic-ref %s: %v", name, out, err)
}
return string(bytes.TrimSpace(out)), nil
}
We can implement the list
command like so.
case command == "list\n":
refs, err := GitListRefs()
if err != nil {
return fmt.Errorf("command list: %v", err)
}
head, err := GitSymbolicRef("HEAD")
if err != nil {
return fmt.Errorf("command list: %v", err)
}
for refname := range refs {
fmt.Printf("? %s\n", refname)
}
fmt.Printf("@%s HEAD\n", head)
fmt.Printf("\n")
The import
command
Next up is the import
command, which git uses when trying to fetch or clone. This command actually comes in a batch; it is sent as a series of lines import $refname
followed by a blank line. When git sends this command to the helper, it executes the git fast-import
binary, and pipes the helper’s stdout into its stdin. In other words, the helper is expected to return a git fast-export stream on stdout.
Let’s look at git-remote-testgit’s implementation.
# read all import lines
while true
do
ref="${line#* }"
refs="$refs $ref"
read line
test "${line%% *}" != "import" && break
done
if test -n "$gitmarks"
then
echo "feature import-marks=$gitmarks"
echo "feature export-marks=$gitmarks"
fi
if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
then
echo "feature done"
exit 1
fi
echo "feature done"
git fast-export \
${testgitmarks:+"--import-marks=$testgitmarks"} \
${testgitmarks:+"--export-marks=$testgitmarks"} \
$refs |
sed -e "s#refs/heads/#${prefix}/heads/#g"
echo "done"
The loop at the top, true to the comment, accumulates all the import $refname
commands into a single variable $refs
, which is a list of the refs separated by spaces.
Following that, if the script is using a gitmarks file (which we’re assuming it is), it prints out feature import-marks=$gitmarks
and feature export-marks=$gitmarks
. This tells git to pass --import-marks=$gitmarks
and --export-marks=$gitmarks
to git fast-import.
The next branch fails the helper if $GIT_REMOTE_TESTGIT_FAILURE
is set for testing purposes.
After that, feature done
is printed, signalling that the export stream follows.
Finally, git fast-export is called in the remote repository, setting the marks files to the remote marks, $testgitmarks
, and then passing the list of refs we want to export.
The output of git-fast-export is piped through a sed script that maps refs/heads/
to refs/testgit/$alias/heads/
. The refspec that we passed to git will take care of this mapping when we export.
After the export stream, done
is printed.
Let’s try this in go.
case strings.HasPrefix(command, "import "):
refs := make([]string, 0)
for {
// Have to make sure to trim the trailing newline.
ref := strings.TrimSpace(strings.TrimPrefix(command, "import "))
refs = append(refs, ref)
command, err = stdinReader.ReadString('\n')
if err != nil {
return err
}
if !strings.HasPrefix(command, "import ") {
break
}
}
fmt.Printf("feature import-marks=%s\n", gitmarks)
fmt.Printf("feature export-marks=%s\n", gitmarks)
fmt.Printf("feature done\n")
args := []string{
"fast-export",
"--import-marks", gomarks,
"--export-marks", gomarks,
"--refspec", refspec}
args = append(args, refs...)
cmd := exec.Command("git", args...)
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
if err := cmd.Run(); err != nil {
return fmt.Errorf("command import: git fast-export: %v", err)
}
fmt.Printf("done\n")
The export
command
Next up is the export
command. When we finish this one, our helper is done.
Git issues this command when we are pushing to the remote repository. After sending the command over stdin, git follows it with a stream produced by git fast-export
, which we can git fast-import
into the remote repository.
if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
then
# consume input so fast-export doesn't get SIGPIPE;
# git would also notice that case, but we want
# to make sure we are exercising the later
# error checks
while read line; do
test "done" = "$line" && break
done
exit 1
fi
before=$(git for-each-ref --format=' %(refname) %(objectname) ')
git fast-import \
${force:+--force} \
${testgitmarks:+"--import-marks=$testgitmarks"} \
${testgitmarks:+"--export-marks=$testgitmarks"} \
--quiet
# figure out which refs were updated
git for-each-ref --format='%(refname) %(objectname)' |
while read ref a
do
case "$before" in
*" $ref $a "*)
continue ;; # unchanged
esac
if test -z "$GIT_REMOTE_TESTGIT_PUSH_ERROR"
then
echo "ok $ref"
else
echo "error $ref $GIT_REMOTE_TESTGIT_PUSH_ERROR"
fi
done
echo
The first if statement is, again, just for testing purposes.
The next line is more interesting. It creates a space separated list of $refname $objectname
pairs of the refs which we will use to determine which refs were updated in the import.
The next command is rather self explanatory. git fast-import
is run on the stream we receive on stdin, passing --force
if specified, --quiet
, and the remote marks files.
Next it runs git for-each-ref
again to see what refs have changed. For every ref this command returns, it checks to see if the $refname $objectname
pair is in the $before
list. If it is, nothing changed and it continues onto the next. If the ref isn’t in the list, however, it prints ok $refname
to signify to git that the ref updated successfully. Printing error $refname $message
tells git that a ref failed to be imported on the remote end.
Finally, it prints a blank line to show that the import is done.
Now we can write it ourselves. We can use the GitListRefs()
function we defined earlier.
case command == "export\n":
beforeRefs, err := GitListRefs()
if err != nil {
return fmt.Errorf("command export: collecting before refs: %v", err)
}
cmd := exec.Command("git", "fast-import", "--quiet",
"--import-marks="+gomarks,
"--export-marks="+gomarks)
cmd.Stderr = os.Stderr
cmd.Stdin = os.Stdin
if err := cmd.Run(); err != nil {
return fmt.Errorf("command export: git fast-import: %v", err)
}
afterRefs, err := GitListRefs()
if err != nil {
return fmt.Errorf("command export: collecting after refs: %v", err)
}
for refname, objectname := range afterRefs {
if beforeRefs[refname] != objectname {
fmt.Printf("ok %s\n", refname)
}
}
fmt.Printf("\n")
Trying it out
Run go install
, which should build and install git-remote-go
to go/bin
.
You can try the following; first we create two empty git repositories, then make a commit in testlocal, and push it to testremote using our new helper.
$ cd $HOME
$ git init testremote
Initialized empty Git repository in $HOME/testremote/.git/
$ git init testlocal
Initialized empty Git repository in $HOME/testlocal/.git/
$ cd testlocal
$ echo 'Hello, world!' >hello.txt
$ git add hello.txt
$ git commit -m "First commit."
[master (root-commit) 50d3a83] First commit.
1 file changed, 1 insertion(+)
create mode 100644 hello.txt
$ git remote add origin go::$HOME/testremote
$ git push --all origin
To go::$HOME/testremote
* [new branch] master -> master
$ cd ../testremote
$ git checkout master
$ ls
hello.txt
$ cat hello.txt
Hello, world!
Uses for git remote helpers
Git remote helpers have been used to implement interfaces to other source control (like felipec/git-remote-hg), or push code into CouchDBs (peritus/git-remote-couch), among others. You could probably think of more.
I wrote a git remote helper for my original motivation, git-remote-grave. You can use it to push and fetch from encrypted archives on your file system or over HTTP/HTTPS.
$ git remote add usb grave::/media/usb/backup.grave
$ git push --all backup
Using a couple compression tricks, the archives are typically 22% the size of the original repository.
If you want a convenient place to store your encrypted git repositories, I made the site filegrave.com.
Discussion of this article is taking place on Hacker News and /r/programming.