Git 强制策略实例
在本节中,我们应用前面学到的知识建立这样一个Git 工作流程:检查提交信息的格式,只接受纯fast-forward内容的推送,并且指定用户只能修改项目中的特定子目录。我们将写一个客户端角本来提示开发人员他们推送的内容是否会被拒绝,以及一个服务端脚本来实际执行这些策略。
这些脚本使用 Ruby 写成,一半由于它是作者倾向的脚本语言,另外作者觉得它是最接近伪代码的脚本语言;因而即便你不使用 Ruby 也能大致看懂。不过任何其他语言也一样适用。所有 Git 自带的样例脚本都是用 Perl 或 Bash 写的。所以从这些脚本中能找到相当多的这两种语言的挂钩样例。
服务端挂钩
所有服务端的工作都在hooks(挂钩)目录的 update(更新)脚本中制定。update 脚本为每一个得到推送的分支运行一次;它接受推送目标的索引,该分支原来指向的位置,以及被推送的新内容。如果推送是通过 SSH 进行的,还可以获取发出此次操作的用户。如果设定所有操作都通过公匙授权的单一帐号(比如"git")进行,就有必要通过一个 shell 包装依据公匙来判断用户的身份,并且设定环境变量来表示该用户的身份。下面假设尝试连接的用户储存在 $USER 环境变量里,我们的 update 脚本首先搜集一切需要的信息:
#!/usr/bin/env ruby
$refname = ARGV[0]
$oldrev = ARGV[1]
$newrev = ARGV[2]
$user = ENV['USER']
puts "Enforcing Policies... \n(#{$refname}) (#{$oldrev[0,6]}) (#{$newrev[0,6]})"
没错,我在用全局变量。别鄙视我——这样比较利于演示过程。
指定特殊的提交信息格式
我们的第一项任务是指定每一条提交信息都必须遵循某种特殊的格式。作为演示,假定每一条信息必须包含一条形似 “ref: 1234” 这样的字符串,因为我们需要把每一次提交和项目的问题追踪系统。我们要逐一检查每一条推送上来的提交内容,看看提交信息是否包含这么一个字符串,然后,如果该提交里不包含这个字符串,以非零返回值退出从而拒绝此次推送。
把 $newrev 和 $oldrev 变量的值传给一个叫做 git rev-list 的 Git plumbing 命令可以获取所有提交内容的 SHA-1 值列表。git rev-list 基本类似 git log 命令,但它默认只输出 SHA-1 值而已,没有其他信息。所以要获取由 SHA 值表示的从一次提交到另一次提交之间的所有 SHA 值,可以运行:
$ git rev-list 538c33..d14fc7
d14fc7c847ab946ec39590d87783c69b031bdfb7
9f585da4401b0a3999e84113824d15245c13f0be
234071a1be950e2a8d078e6141f5cd20c1e61ad3
dfa04c9ef3d5197182f13fb5b9b1fb7717d2222a
17716ec0f1ff5c77eff40b7fe912f9f6cfd0e475
截取这些输出内容,循环遍历其中每一个 SHA 值,找出与之对应的提交信息,然后用正则表达式来测试该信息包含的格式话的内容。
下面要搞定如何从所有的提交内容中提取出提交信息。使用另一个叫做 git cat-file 的 Git plumbing 工具可以获得原始的提交数据。我们将在第九章了解到这些 plumbing 工具的细节;现在暂时先看一下这条命令的输出:
$ git cat-file commit ca82a6
tree cfda3bf379e4f8dba8717dee55aab78aef7f4daf
parent 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
author Scott Chacon <schacon@gmail.com> 1205815931 -0700
committer Scott Chacon <schacon@gmail.com> 1240030591 -0700
changed the version number
通过 SHA-1 值获得提交内容中的提交信息的一个简单办法是找到提交的第一行,然后取从它往后的所有内容。可以使用 Unix 系统的 sed 命令来实现该效果:
$ git cat-file commit ca82a6 | sed '1,/^$/d'
changed the version number
这条咒语从每一个待提交内容里提取提交信息,并且会在提取信息不符合要求的情况下退出。为了退出脚本和拒绝此次推送,返回一个非零值。整个脚本大致如下:
$regex = /\[ref: (\d+)\]/
# 指定提交信息格式
def check_message_format
missed_revs = `git rev-list #{$oldrev}..#{$newrev}`.split("\n")
missed_revs.each do |rev|
message = `git cat-file commit #{rev} | sed '1,/^$/d'`
if !$regex.match(message)
puts "[POLICY] Your message is not formatted correctly"
exit 1
end
end
end
check_message_format
把这一段放在 update 脚本里,所有包含不符合指定规则的提交都会遭到拒绝。
实现基于用户的访问权限控制列表(ACL)系统
假设你需要添加一个使用访问权限控制列表的机制来指定哪些用户对项目的哪些部分有推送权限。某些用户具有全部的访问权,其他人只对某些子目录或者特定的文件具有推送权限。要搞定这一点,所有的规则将被写入一个位于服务器的原始 Git 仓库的 acl 文件。我们让 update 挂钩检阅这些规则,审视推送的提交内容中需要修改的所有文件,然后决定执行推送的用户是否对所有这些文件都有权限。
我们首先要创建这个列表。这里使用的格式和 CVS 的 ACL 机制十分类似:它由若干行构成,第一项内容是 avail 或者 unavail,接着是逗号分隔的规则生效用户列表,最后一项是规则生效的目录(空白表示开放访问)。这些项目由 | 字符隔开。
下例中,我们指定几个管理员,几个对 doc 目录具有权限的文档作者,以及一个对 lib 和 tests 目录具有权限的开发人员,相应的 ACL 文件如下:
avail|nickh,pjhyett,defunkt,tpw
avail|usinclair,cdickens,ebronte|doc
avail|schacon|lib
avail|schacon|tests
首先把这些数据读入你编写的数据结构。本例中,为保持简洁,我们暂时只实现 avail 的规则(译注:也就是省略了 unavail 部分)。下面这个方法生成一个关联数组,它的主键是用户名,值是一个该用户有写权限的所有目录组成的数组:
def get_acl_access_data(acl_file)
# read in ACL data
acl_file = File.read(acl_file).split("\n").reject { |line| line == '' }
access = {}
acl_file.each do |line|
avail, users, path = line.split('|')
next unless avail == 'avail'
users.split(',').each do |user|
access[user] ||= []
access[user] << path
end
end
access
end
针对之前给出的 ACL 规则文件,这个 get_acl_access_data 方法返回的数据结构如下:
{"defunkt"=>[nil],
"tpw"=>[nil],
"nickh"=>[nil],
"pjhyett"=>[nil],
"schacon"=>["lib", "tests"],
"cdickens"=>["doc"],
"usinclair"=>["doc"],
"ebronte"=>["doc"]}
搞定了用户权限的数据,下面需要找出哪些位置将要被提交的内容修改,从而确保试图推送的用户对这些位置有全部的权限。
使用 git log 的 --name-only 选项(在第二章里简单的提过)我们可以轻而易举的找出一次提交里修改的文件:
$ git log -1 --name-only --pretty=format:'' 9f585d
README
lib/test.rb
使用 get_acl_access_data 返回的 ACL 结构来一一核对每一次提交修改的文件列表,就能找出该用户是否有权限推送所有的提交内容:
# 仅允许特定用户修改项目中的特定子目录
def check_directory_perms
access = get_acl_access_data('acl')
# 检查是否有人在向他没有权限的地方推送内容
new_commits = `git rev-list #{$oldrev}..#{$newrev}`.split("\n")
new_commits.each do |rev|
files_modified = `git log -1 --name-only --pretty=format:'' #{rev}`.split("\n")
files_modified.each do |path|
next if path.size == 0
has_file_access = false
access[$user].each do |access_path|
if !access_path # 用户拥有完全访问权限
|| (path.index(access_path) == 0) # 或者对此位置有访问权限
has_file_access = true
end
end
if !has_file_access
puts "[POLICY] You do not have access to push to #{path}"
exit 1
end
end
end
end
check_directory_perms
以上的大部分内容应该都比较容易理解。通过 git rev-list 获取推送到服务器内容的提交列表。然后,针对其中每一项,找出它试图修改的文件然后确保执行推送的用户对这些文件具有权限。一个不太容易理解的 Ruby 技巧石 path.index(access_path) ==0 这句,它的返回真值如果路径以 access_path 开头——这是为了确保 access_path 并不是只在允许的路径之一,而是所有准许全选的目录都在该目录之下。
现在你的用户没法推送带有不正确的提交信息的内容,也不能在准许他们访问范围之外的位置做出修改。
只允许 Fast-Forward 类型的推送
剩下的最后一项任务是指定只接受 fast-forward 的推送。在 Git 1.6 或者更新版本里,只需要设定 receive.denyDeletes 和 receive.denyNonFastForwards 选项就可以了。但是通过挂钩的实现可以在旧版本的 Git 上工作,并且通过一定的修改它它可以做到只针对某些用户执行,或者更多以后可能用的到的规则。
检查这一项的逻辑是看看提交里是否包含从旧版本里能找到但在新版本里却找不到的内容。如果没有,那这是一次纯 fast-forward 的推送;如果有,那我们拒绝此次推送:
# 只允许纯 fast-forward 推送
def check_fast_forward
missed_refs = `git rev-list #{$newrev}..#{$oldrev}`
missed_ref_count = missed_refs.split("\n").size
if missed_ref_count > 0
puts "[POLICY] Cannot push a non fast-forward reference"
exit 1
end
end
check_fast_forward
一切都设定好了。如果现在运行 chmod u+x .git/hooks/update —— 修改包含以上内容文件的权限,然后尝试推送一个包含非 fast-forward 类型的索引,会得到一下提示:
$ git push -f origin master
Counting objects: 5, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 323 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
Enforcing Policies...
(refs/heads/master) (8338c5) (c5b616)
[POLICY] Cannot push a non-fast-forward reference
error: hooks/update exited with error code 1
error: hook declined to update refs/heads/master
To git@gitserver:project.git
! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git@gitserver:project.git'
这里有几个有趣的信息。首先,我们可以看到挂钩运行的起点:
Enforcing Policies...
(refs/heads/master) (fb8c72) (c56860)
注意这是从 update 脚本开头输出到标准你输出的。所有从脚本输出的提示都会发送到客户端,这点很重要。
下一个值得注意的部分是错误信息。
[POLICY] Cannot push a non fast-forward reference
error: hooks/update exited with error code 1
error: hook declined to update refs/heads/master
第一行是我们的脚本输出的,在往下是 Git 在告诉我们 update 脚本退出时返回了非零值因而推送遭到了拒绝。最后一点:
To git@gitserver:project.git
! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git@gitserver:project.git'
我们将为每一个被挂钩拒之门外的索引受到一条远程信息,解释它被拒绝是因为一个挂钩的原因。
而且,如果那个 ref 字符串没有包含在任何的提交里,我们将看到前面脚本里输出的错误信息:
[POLICY] Your message is not formatted correctly
又或者某人想修改一个自己不具备权限的文件然后推送了一个包含它的提交,他将看到类似的提示。比如,一个文档作者尝试推送一个修改到 lib 目录的提交,他会看到
[POLICY] You do not have access to push to lib/test.rb
全在这了。从这里开始,只要 update 脚本存在并且可执行,我们的仓库永远都不会遭到回转或者包含不符合要求信息的提交内容,并且用户都被锁在了沙箱里面。
Client-Side Hooks
The downside to this approach is the whining that will inevitably result when your users’ commit pushes are rejected. Having their carefully crafted work rejected at the last minute can be extremely frustrating and confusing; and furthermore, they will have to edit their history to correct it, which isn’t always for the faint of heart.
The answer to this dilemma is to provide some client-side hooks that users can use to notify them when they’re doing something that the server is likely to reject. That way, they can correct any problems before committing and before those issues become more difficult to fix. Because hooks aren’t transferred with a clone of a project, you must distribute these scripts some other way and then have your users copy them to their .git/hooks directory and make them executable. You can distribute these hooks within the project or in a separate project, but there is no way to set them up automatically.
To begin, you should check your commit message just before each commit is recorded, so you know the server won’t reject your changes due to badly formatted commit messages. To do this, you can add the commit-msg hook. If you have it read the message from the file passed as the first argument and compare that to the pattern, you can force Git to abort the commit if there is no match:
#!/usr/bin/env ruby
message_file = ARGV[0]
message = File.read(message_file)
$regex = /\[ref: (\d+)\]/
if !$regex.match(message)
puts "[POLICY] Your message is not formatted correctly"
exit 1
end
If that script is in place (in .git/hooks/commit-msg) and executable, and you commit with a message that isn’t properly formatted, you see this:
$ git commit -am 'test'
[POLICY] Your message is not formatted correctly
No commit was completed in that instance. However, if your message contains the proper pattern, Git allows you to commit:
$ git commit -am 'test [ref: 132]'
[master e05c914] test [ref: 132]
1 files changed, 1 insertions(+), 0 deletions(-)
Next, you want to make sure you aren’t modifying files that are outside your ACL scope. If your project’s .git directory contains a copy of the ACL file you used previously, then the following pre-commit script will enforce those constraints for you:
#!/usr/bin/env ruby
$user = ENV['USER']
# [ insert acl_access_data method from above ]
# only allows certain users to modify certain subdirectories in a project
def check_directory_perms
access = get_acl_access_data('.git/acl')
files_modified = `git diff-index --cached --name-only HEAD`.split("\n")
files_modified.each do |path|
next if path.size == 0
has_file_access = false
access[$user].each do |access_path|
if !access_path || (path.index(access_path) == 0)
has_file_access = true
end
if !has_file_access
puts "[POLICY] You do not have access to push to #{path}"
exit 1
end
end
end
check_directory_perms
This is roughly the same script as the server-side part, but with two important differences. First, the ACL file is in a different place, because this script runs from your working directory, not from your Git directory. You have to change the path to the ACL file from this
access = get_acl_access_data('acl')
to this:
access = get_acl_access_data('.git/acl')
The other important difference is the way you get a listing of the files that have been changed. Because the server-side method looks at the log of commits, and, at this point, the commit hasn’t been recorded yet, you must get your file listing from the staging area instead. Instead of
files_modified = `git log -1 --name-only --pretty=format:'' #{ref}`
you have to use
files_modified = `git diff-index --cached --name-only HEAD`
But those are the only two differences — otherwise, the script works the same way. One caveat is that it expects you to be running locally as the same user you push as to the remote machine. If that is different, you must set the $user variable manually.
The last thing you have to do is check that you’re not trying to push non-fast-forwarded references, but that is a bit less common. To get a reference that isn’t a fast-forward, you either have to rebase past a commit you’ve already pushed up or try pushing a different local branch up to the same remote branch.
Because the server will tell you that you can’t push a non-fast-forward anyway, and the hook prevents forced pushes, the only accidental thing you can try to catch is rebasing commits that have already been pushed.
Here is an example pre-rebase script that checks for that. It gets a list of all the commits you’re about to rewrite and checks whether they exist in any of your remote references. If it sees one that is reachable from one of your remote references, it aborts the rebase:
#!/usr/bin/env ruby
base_branch = ARGV[0]
if ARGV[1]
topic_branch = ARGV[1]
else
topic_branch = "HEAD"
end
target_shas = `git rev-list #{base_branch}..#{topic_branch}`.split("\n")
remote_refs = `git branch -r`.split("\n").map { |r| r.strip }
target_shas.each do |sha|
remote_refs.each do |remote_ref|
shas_pushed = `git rev-list ^#{sha}^@ refs/remotes/#{remote_ref}`
if shas_pushed.split(“\n”).include?(sha)
puts "[POLICY] Commit #{sha} has already been pushed to #{remote_ref}"
exit 1
end
end
end
This script uses a syntax that wasn’t covered in the Revision Selection section of Chapter 6. You get a list of commits that have already been pushed up by running this:
git rev-list ^#{sha}^@ refs/remotes/#{remote_ref}
The SHA^@ syntax resolves to all the parents of that commit. You’re looking for any commit that is reachable from the last commit on the remote and that isn’t reachable from any parent of any of the SHAs you’re trying to push up — meaning it’s a fast-forward.
The main drawback to this approach is that it can be very slow and is often unnecessary — if you don’t try to force the push with -f, the server will warn you and not accept the push. However, it’s an interesting exercise and can in theory help you avoid a rebase that you might later have to go back and fix.