Subscribe to XML Feed
10 Mar 2010

Git's Little Bundle of Joy

The scenario is thus: you need to sneakernet a git push. Maybe your network is down and you want to send changes to your co-workers. Perhaps you’re working somewhere onsite and don’t have access to the local network for security reasons. Maybe your wireless/ethernet card just broke. Maybe you don’t have access to a shared server for the moment, you want to email someone updates and you don’t want to transfer 40 commits via format-patch.

Enter git bundle. The bundle command will package up everything that would normally be pushed over the wire with a git push command into a binary file that you can email or sneakernet around, then unbundle into another repository.

Let’s see a simple example. Let’s say you have a repository with two commits:

$ git log
commit 9a466c572fe88b195efd356c3f2bbeccdb504102
Author: Scott Chacon <schacon@gmail.com>
Date:   Wed Mar 10 07:34:10 2010 -0800

    second commit

commit b1ec3248f39900d2a406049d762aa68e9641be25
Author: Scott Chacon <schacon@gmail.com>
Date:   Wed Mar 10 07:34:01 2010 -0800

    first commit

If you want to send that repository to someone and you don’t have access to a repository to push to, or simply don’t want to set one up, you can bundle it.

$ git bundle create repo.bundle master
Counting objects: 6, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (6/6), 441 bytes, done.
Total 6 (delta 0), reused 0 (delta 0)

Now you have a file named repo.bundle that has all the data needed to re-create the repository. You can email that to someone else, or put it on a USB drive and walk it over.

Now on the other side, say you are sent this repo.bundle file and want to work on the project.

$ git clone repo.bundle -b master repo
Initialized empty Git repository in /private/tmp/bundle/repo/.git/
$ cd repo
$ git log --oneline
9a466c5 second commit
b1ec324 first commit

I had to specify -b master because otherwise it couldn’t find the HEAD reference for some reason, but you may not need to do that. The point is, you have now cloned directly from a file, rather than from a remote server.

Now let’s say you do three commits on it and want to send the new commits back via a bundle on a usb stick or email.

$ git log --oneline
71b84da last commit - second repo
c99cf5b fourth commit - second repo
7011d3d third commit - second repo
9a466c5 second commit
b1ec324 first commit

First we need to determine the range of commits we want to include in the bundle. The easiest way would have been to drop a branch when we started, so we could say start_branch..master or master ^start_branch, but if we didn’t we can just list the starting SHA explicitly:

$ git log --oneline master ^9a466c5
71b84da last commit - second repo
c99cf5b fourth commit - second repo
7011d3d third commit - second repo

So we have the list of commits we want to include in the bundle, let’s bundle em up. We do that with the git bundle create command, giving it a filename we want our bundle to be and the range of commits we want to go into it.

$ git bundle create commits.bundle master ^9a466c5
Counting objects: 11, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (9/9), 775 bytes, done.
Total 9 (delta 0), reused 0 (delta 0)

Now we will have a commits.bundle file in our directory. If we take that and send it to our partner, she can then import it into the original repository, even if more work has been done there in the meantime.

When she gets the bundle, she can inspect it to see what it contains before she imports it into her repository. The first command is the bundle verify command that will make sure the file is actually a valid Git bundle and that you have all the neccesary ancestors to reconstitute it properly.

$ git bundle verify ../commits.bundle
The bundle contains 1 ref
71b84daaf49abed142a373b6e5c59a22dc6560dc refs/heads/master
The bundle requires these 1 ref
9a466c572fe88b195efd356c3f2bbeccdb504102 second commit
../commits.bundle is okay

If the bundler had created a bundle of just the last two commits they had done, rather than all three, the original repository would not be able to import it, since it is missing requisite history. The verify command would have looked like this instead:

$ git bundle verify ../commits-bad.bundle
error: Repository lacks these prerequisite commits:
error: 7011d3d8fc200abe0ad561c011c3852a4b7bbe95 third commit - second repo

However, our first bundle is valid, so we can fetch in commits from it. If you want to see what branches are in the bundle that can be imported, there is also a command to just list the heads:

$ git bundle list-heads ../commits.bundle
71b84daaf49abed142a373b6e5c59a22dc6560dc refs/heads/master

The verify sub-command will tell you the heads, too, as will a normal git ls-remote command, which you may have used for debugging before. The point is to see what can be pulled in, so you can use the fetch or pull commands to import commits from this bundle. Here we’ll fetch the ‘master’ branch of the bundle to a branch named ‘other-master’ in our repository:

$ git fetch ../commits.bundle master:other-master
From ../commits.bundle
 * [new branch]      master     -> other-master

Now we can see that we have the imported commits on the ‘other-master’ branch as well as any commits we’ve done in the meantime in our own ‘master’ branch.

$ git log --oneline --decorate --graph --all
* 8255d41 (HEAD, master) third commit - first repo
| * 71b84da (other-master) last commit - second repo
| * c99cf5b fourth commit - second repo
| * 7011d3d third commit - second repo
|/
* 9a466c5 second commit
* b1ec324 first commit

So, git bundle can be really useful for doing network-y, share-y operations when you don’t have the proper network or shared repository to do so.

View Comments
08 Mar 2010

Rerere Your Boat...

One of the things I didn’t touch on at all in the book is the git rerere functionality. This also came up recently during one of my trainings, and I realize that a lot of people probably could use this, so I wanted to let you all now about it.

The `git rerere` functionality is a bit of a hidden feature (Git actually has a lot of cool hidden features, if you haven’t figured that out yet). The name stands for “reuse recorded resolution” and as the name implies, it allows you to ask Git to remember how you’ve resolved a hunk conflict so that the next time it sees the same conflict, Git can automatically resolve it for you.

There are a number of scenarios in which this functionality might be really handy. One of the examples that is mentioned in the documentation is if you want to make sure a long lived topic branch will merge cleanly but don’t want to have a bunch of intermediate merge commits. With rerere turned on you can merge occasionally, resolve the conflicts, then back out the merge. If you do this continuously, then the final merge should be easy because rerere can just do everything for you automatically.

This same tactic can be used if you want to keep a branch rebased so you don’t have to deal with the same rebasing conflicts each time you do it. Or if you want to take a branch that you merged and fixed a bunch of conflicts and then decide to rebase it instead - you likely won’t have to do all the same conflicts again.

The other situation I can think of is where you merge a bunch of evolving topic branches together into a testable head occasionally. If the tests fail, you can rewind the merges and re-do them without the topic branch that made the tests fail without having to re-resolve the conflicts again.

To enable the rerere functionality, you simply have to run this config setting:

$ git config --global rerere.enabled 1

You can also turn it on by creating the .git/rr-cache directory in a specific repository, but I think the config setting is clearer, and it can be done globally.

Now let’s see a simple example. If we have a file that looks like this:

#! /usr/bin/env ruby

def hello
  puts 'hello world'
end

and in one branch we change the word ‘hello’ to ‘hola’, then in another branch we change the ‘world’ to ‘mundo’.

When we merge the two branches together, we’ll get a merge conflict:

$ git merge i18n-world
Auto-merging hello.rb
CONFLICT (content): Merge conflict in hello.rb
Recorded preimage for 'hello.rb'
Automatic merge failed; fix conflicts and then commit the result.

You should notice the new line Recorded preimage for FILE in there. Otherwise it should look exactly like a normal merge conflict. At this point, rerere can tell us some stuff. Normally, you might run git status at this point to see what all conflicted:

$ git status
# On branch master
# Unmerged paths:
#   (use "git reset HEAD <file>..." to unstage)
#   (use "git add <file>..." to mark resolution)
#
#	both modified:      hello.rb
#

However, git rerere will also tell you what it has recorded the pre-merge state for with git rerere status:

$ git rerere status
hello.rb

And git rerere diff will show the current state of the resolution - what you started with to resolve and what you’ve resolved it to.

$ git rerere diff
--- a/hello.rb
+++ b/hello.rb
@@ -1,11 +1,11 @@
 #! /usr/bin/env ruby

 def hello
-<<<<<<<
-  puts 'hello mundo'
-=======
+<<<<<<< HEAD
   puts 'hola world'
->>>>>>>
+=======
+  puts 'hello mundo'
+>>>>>>> i18n-world
 end

Also (and this isn’t really related to rerere), you can use ls-files -u to see the conflicted files and the before, left and right versions:

$ git ls-files -u
100644 39804c942a9c1f2c03dc7c5ebcd7f3e3a6b97519 1	hello.rb
100644 a440db6e8d1fd76ad438a49025a9ad9ce746f581 2	hello.rb
100644 54336ba847c3758ab604876419607e9443848474 3	hello.rb

Anyhow, so now you resolve it to just be “puts ‘hola mundo’” and you can run the rerere diff command again to see what rerere will remember:

$ git rerere diff
--- a/hello.rb
+++ b/hello.rb
@@ -1,11 +1,7 @@
 #! /usr/bin/env ruby

 def hello
-<<<<<<<
-  puts 'hello mundo'
-=======
-  puts 'hola world'
->>>>>>>
+  puts 'hola mundo'
 end

So that basically says, when I see a hunk conflict that has ‘hello mundo’ on one side and ‘hola world’ on the other, resolve it to ‘hola mundo’.

Now we can mark it as resolved and commit it:

$ git add hello.rb
$ git commit
Recorded resolution for 'hello.rb'.
[master 68e16e5] Merge branch 'i18n'

You can see that it “Recorded resolution for FILE”.

Now, let’s undo that merge and then rebase it on top of our master branch instead.

$ git reset --hard HEAD^
HEAD is now at ad63f15 i18n the hello

Our merge is undone. Now let’s rebase the topic branch.

$ git checkout i18n-world
Switched to branch 'i18n-world'
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: i18n one word
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging hello.rb
CONFLICT (content): Merge conflict in hello.rb
Resolved 'hello.rb' using previous resolution.
Failed to merge in the changes.
Patch failed at 0001 i18n one word

Now, we got the same merge conflict like we expected, but check out the Resolved FILE using previous resolution line. If we look at the file, we’ll see that it’s already been resolved:

$ cat hello.rb
#! /usr/bin/env ruby

def hello
  puts 'hola mundo'
end

Also, git diff will show you how it was automatically re-resolved:

$ git diff
diff --cc hello.rb
index a440db6,54336ba..0000000
--- a/hello.rb
+++ b/hello.rb
@@@ -1,7 -1,7 +1,7 @@@
  #! /usr/bin/env ruby

  def hello
-   puts 'hola world'
 -  puts 'hello mundo'
++  puts 'hola mundo'
  end

You can also recreate the conflicted file state with the checkout command:

$ git checkout --conflict=merge hello.rb
$ cat hello.rb
#! /usr/bin/env ruby

def hello
<<<<<<< ours
  puts 'hola world'
=======
  puts 'hello mundo'
>>>>>>> theirs
end

That might be a new command to you as well, the --conflict option to git checkout. You can actually have checkout do a couple of things in this situation to help you resolve conflicts. Another interesting value for that option is ‘diff3’, which will give you left, right and common to help you resolve the conflict manually:

$ git checkout --conflict=diff3 hello.rb
$ cat hello.rb
#! /usr/bin/env ruby

def hello
<<<<<<< ours
  puts 'hola world'
|||||||
  puts 'hello world'
=======
  puts 'hello mundo'
>>>>>>> theirs
end

Anyhow, then you can re-resolve it by just running rerere again:

$ git rerere
Resolved 'hello.rb' using previous resolution.
$ cat hello.rb
#! /usr/bin/env ruby

def hello
  puts 'hola mundo'
end

Magical re-resolving! Then you can add and continue the rebase to complete it.

$ git add hello.rb
$ git rebase --continue
Applying: i18n one word

So, if you do a lot of re-merges, or want to keep a topic branch up to date with your master branch without a ton of merges, or you rebase often or any of the above, turn on rerere to help your life out a bit.

View Comments
04 Mar 2010

Smart HTTP Transport

When I was done writing Pro Git, the only transfer protocols that existed were the git://, ssh:// and basic http:// transports. I wrote about the basic strengths and weaknesses of each in Chapter 4. At the time, one of the big differences between Git and most other VCS’s was that HTTP was not a mainly used protocol - that’s because it was read-only and very inefficient. Git would simply use the webserver to ask for individual objects and packfiles that it needed. It would even ask for big packfiles even if it only needed one object from it.

As of the release of version 1.6.6 at the end of last year, however, Git can now use the HTTP protocol just about as efficiently as the git or ssh versions (thanks to the amazing work by Shawn Pearce, who also happened to have been the technical editor of Pro Git). Amusingly, it has been given very little fanfare - the release notes for 1.6.6 state only this:

* "git fetch" over http learned a new mode that is different from the
  traditional "dumb commit walker".

Which is a huge understatement, given that I think this will become the standard Git protocol in the very near future. I believe this because it’s both efficient and can be run either secure and authenticated (https) or open and unauthenticated (http). It also has the huge advantage that most firewalls have those ports (80 and 443) open already and normal users don’t have to deal with ssh-keygen and the like. Once most clients have updated to at least v1.6.6, http will have a big place in the Git world.

What is "Smart" HTTP?

Before version 1.6.6, Git clients, when you clone or fetch over HTTP would basically just do a series of GETs to grab individual objects and packfiles on the server from bare Git repositories, since it knows the layout of the repo. This functionality is documented fairly completely in Chapter 9. Conversations over this protocol used to look like this:

$ git clone http://github.com/schacon/simplegit-progit.git
Initialized empty Git repository in /private/tmp/simplegit-progit/.git/
got ca82a6dff817ec66f44342007202690a93763949
walk ca82a6dff817ec66f44342007202690a93763949
got 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Getting alternates list for http://github.com/schacon/simplegit-progit.git
Getting pack list for http://github.com/schacon/simplegit-progit.git
Getting index for pack 816a9b2334da9953e530f27bcac22082a9f5b835
Getting pack 816a9b2334da9953e530f27bcac22082a9f5b835
 which contains cfda3bf379e4f8dba8717dee55aab78aef7f4daf
walk 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
walk a11bef06a3f659402fe7563abf99ad00de2209e6

It is a completly passive server, and if the client needs one object in a packfile of thousands, the server cannot pull the single object out, the client is forced to request the entire packfile.

In contrast, the smarter protocols (git and ssh) would instead have a conversation with the git upload-pack process on the server which would determine the exact set of objects the client needs and build a custom packfile with just those objects and stream it over.

The new clients will now send a request with an extra GET parameter that older servers will simply ignore, but servers running the smart CGI will recognize and switch modes to a multi-POST mode that is similar to the conversation that happens over the git protocol. Once this series of POSTs is complete, the server knows what objects the client needs and can build a custom packfile and stream it back.

Furthermore, in the olden days if you wanted to push over http, you had to setup a DAV-based server, which was rather difficult and also pretty inefficient compared to the smarter protocols. Now you can push over this CGI, which again is very similar to the push mechanisms for the git and ssh protocols. You simply have to authenticate via an HTTP-based method, like basic auth or the like (assuming you don’t want your repository to be world-writable).

The rest of this article will explain setting up a server with the “smart”-http protocol, so you can test out this cool new feature. This feature is referred to as “smart” HTTP vs “dumb” HTTP because it requires having the Git binary installed on the server, where the previous incantation of HTTP transfer required only a simple webserver. It has a real conversation with the client, rather than just dumbly pushing out data.

Setting up Smart HTTP

So, Smart-HTTP is basically just enabling the new CGI script that is provided with Git called `git-http-backend` on the server. This CGI will read the path and headers sent by the revamped git fetch and git push binaries who have learned to communicate in a specific way with a smart server. If the CGI sees that the client is smart, it will communicate smartly with it, otherwise it will simply fall back to the dumb behavior (so it is backward compatible for reads with older clients).

To set it up, it’s best to walk through the instructions on the `git-http-backend` documentation page. Basically, you have to install Git v1.6.6 or higher on a server with an Apache 2.x webserver (it has to be Apache, currently - other CGI servers don’t work, last I checked). Then you add something similar to this to your http.conf file:

SetEnv GIT_PROJECT_ROOT /var/www/git
SetEnv GIT_HTTP_EXPORT_ALL
ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

Then you’ll want to make writes be authenticated somehow, possibly with an Auth block like this:

<LocationMatch "^/git/.*/git-receive-pack$">
        AuthType Basic
        AuthName "Git Access"
        Require group committers
        ...
</LocationMatch>

That is all that is really required to get this running. Now you have a smart http-based Git server that can do anonymous reads and authenticated writes with clients that have upgraded to 1.6.6 and above. How awesome is that? The documentation goes over more complex examples, like making it work with GitWeb and accelerating the dumb fallback reads, if you’re interested.

Rack-based Git Server

If you’re not a fan of Apache or you’re running some other web server, you may want to take a look at an app that I wrote called Grack, which is a Rack-based application for Smart-HTTP Git. Rack is a generic webserver interface for Ruby (similar to WSGI for Python) that has adapters for a ton of web servers. It basically replaces git http-backend for non-Apache servers that can’t run it.

This means that I can write the web handler independent of the web server and it will work with any web server that has a Rack handler. This currently means any FCGI server, Mongrel (and EventedMongrel and SwiftipliedMongrel), WEBrick, SCGI, LiteSpeed, Thin, Ebb, Phusion Passenger and Unicorn. Even cooler, using Warbler and JRuby, you can generate a WAR file that is deployable in any Java web application server (Tomcat, Glassfish, Websphere, JBoss, etc).

So, if you don’t use Apache and you are interested in a Smart-HTTP Git server, you may want to check out Grack. At GitHub, this is the adapter we’re using to eventually implement Smart-HTTP support for all the GitHub repositories. (It’s currently a tad bit behind, but I’ll be starting up on it again soon as I get it into production at GitHub - send pull requests if you find any issues)

Grack is about half as fast as the Apache version for simple ref-listing stuff, but we’re talking 10ths of a second. For most clones and pushes, the data transfer will be the main time-sink, so the load time of the app should be negligible.

In Conclusion

I think HTTP based Git will be a huge part of the future of Git, so if you’re running your own Git server, you should really check it out. If you’re not, GitHub and I’m sure other hosts will soon be supporting it - upgrade your Git client to 1.7ish soon so you can take advantage of it when it happens.

View Comments
02 Mar 2010

Undoing Merges

I would like to start writing more here about general Git tips, tricks and upcoming features. There has actually been a lot of cool stuff that has happened since the book was first published, and a number of interesting things that I didn’t get around to covering in the book. I figure if I start blogging about the more interesting stuff, it should serve as a pretty handy guide should I ever start writing a second edition.

For the first such post, I’m going to cover a topic that was asked about at a training I did recently. The question was about a workflow where long running branches are merged occasionally, much like the Large Merging workflow that I describe in the book. They asked how to unmerge a branch, either permenantly or allowing you to merge it in later.

You can actually do this a number of ways. Let’s say you have history that looks something like this:

You have a couple of topic branches that you have developed and then integrated together by a series of merges. Now you want to revert something back in the history, say ‘C10’ in this case.

The first way to solve the problem could be to rewind ‘master’ back to C8 and then merge the remaining two lines back in again. This requires that anyone you’re collaborating with knows how to handle rewound heads, but if that’s not an issue, this is a perfectly viable solution. This is basically how the ‘pu’ branch is handled in the Git project itself.

$ git checkout master
$ git reset --hard [sha_of_C8]
$ git merge jk/post-checkout
$ git merge db/push-cleanup

Once you rewind and remerge, you’ll instead have a history that looks more like this:

Now you can go back and work on that newly unmerged line and merge it again at a later point, or perhaps ignore it entirely.

Reverting a Merge

However, what if you didn’t find this out until later, or perhaps you or one of your collaborators have done work after this merge series? What if your history looks more like this:

Now you either have to revert one of the merges, or go back, remerge and then cherry-pick the remaining changes again (C9 and C10 in this case), which is confusing and difficult, especially if there are a lot of commits after those merges.

Well, it turns out that Git is actually pretty good at reverting an entire merge. Although you’ve probably only used the git revert command to revert a single commit (if you’ve used it at all), you can also use it to revert merge commits.

All you have to do is specify the merge commit you want to revert and the parent line you want to keep. Let’s say that we want to revert the merge of the jk/post-checkout line. We can do so like this:

$ git revert -m 1 [sha_of_C8]
Finished one revert.
[master 88edd6d] Revert "Merge branch 'jk/post-checkout'"
 1 files changed, 0 insertions(+), 2 deletions(-)

That will introduce a new commit that undoes the changes introduced by merging in the branch in the first place - sort of like a reverse cherry pick of all of the commits that were unique to that branch. Pretty cool.

However, we’re not done.

Reverting the Revert

Let’s say now that you want to re-merge that work again. If you try to merge it again, Git will see that the commits on that branch are in the history and will assume that you are mistakenly trying to merge something you already have.

$ git merge jk/post-checkout
Already up-to-date.

Oops - it did nothing at all. Even more confusing is if you went back and committed on that branch and then tried to merge it in, it would only introduce the changes since you originally merged.

Gah. Now that’s really a strange state and is likely to cause a bunch of conflicts or confusing errors. What you want to do instead is revert the revert of the merge:

$ git revert 88edd6d
Finished one revert.
[master 268e243] Revert "Revert "Merge branch 'jk/post-checkout'""
 1 files changed, 2 insertions(+), 0 deletions(-)

Cool, so now we’ve basically reintroduced everything that was in the branch that we had reverted out before. Now if we have more work on that branch in the meantime, we can just re-merge it.

$ git merge jk/post-checkout
Auto-merging test.txt
Merge made by recursive.
 test.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

So, I hope that’s helpful. This can be particularly useful if you have a merge-heavy development process. In fact, if you work mostly in topic branches before merging for intergration purposes, you may want to use the git merge --no-ff option so that the first merge is not a fast forward and can be reverted out in this manner.

Until next time.

View Comments

Older Posts

Translate This 19 Aug 2009 Comments
The Gory Details 28 Jul 2009 Comments
Welcome to the Pro Git website 11 Feb 2009 Comments