綾小路龍之介の素人思考

[rsync] ディレクトリのマージ作業

複数の場所にあるファイルを 1 つのディレクトリに移動したいが、複数の移動元に同じファイル名のファイルが含まれていることがある。異なるファイル名のファイルに関しては 1 つのディレクトリにまとめて、同じファイル名のファイルは確認のため、別ディレクトリに移動したい。

テスト用にディレクトリとファイルを作成。

$ mkdir -p src{0,1}/dir{0,1}; touch src{0,1}/dir{0,1}/{different,same,empty}-file.txt;
$ find . -type f -name 'different*' -exec sh -ec "echo {} > {};" \;
$ find . -type f -name 'same*' -exec sh -ec "echo same > {};" \;
$ find . | sort
.
./src0
./src0/dir0
./src0/dir0/different-file.txt
./src0/dir0/empty-file.txt
./src0/dir0/same-file.txt
./src0/dir1
./src0/dir1/different-file.txt
./src0/dir1/empty-file.txt
./src0/dir1/same-file.txt
./src1
./src1/dir0
./src1/dir0/different-file.txt
./src1/dir0/empty-file.txt
./src1/dir0/same-file.txt
./src1/dir1
./src1/dir1/different-file.txt
./src1/dir1/empty-file.txt
./src1/dir1/same-file.txt

この状態で src0 と src1 ディレクトリ内の内容を再帰的に dst よび dst_bkup ディレクトリに移動したい。src0 と src1 から見た時の相対パスが同じファイルについて、内容が同じなら dst へ移動。内容が異なるなら、src1 以下のファイルを dst 以下に移動、src0 以下のファイルを dst_bkup 以下に移動。src0 のみおよび src1 のみに含まれるファイルについては dst へ移動。

マージ作業
diff src{0,1}/dir0/different-file.txt1mv {src1,dst}/dir0/different-file.txt; mv {src0,dst_bkup}/dir0/different-file.txt;
diff src{0,1}/dir0/empty-file.txt0mv {src1,dst}/dir0/empty-file.txt; rm src0/dir0/empty-file.txt;
diff src{0,1}/dir0/same-file.txt0mv {src1,dst}/dir0/same-file.txt; rm src0/dir0/same-file.txt;
diff src{0,1}/dir0/only-in-src0-file.txt2mv {src0,dst}/dir0/only-in-src0-file.txt;
diff src{0,1}/dir0/only-in-src1-file.txt2mv {src1,dst}/dir0/only-in-src1-file.txt;
$ diff -sqr src0/ src1/
Files src0/dir0/different-file.txt and src1/dir0/different-file.txt differ
Files src0/dir0/empty-file.txt and src1/dir0/empty-file.txt are identical
Files src0/dir0/same-file.txt and src1/dir0/same-file.txt are identical
Files src0/dir1/different-file.txt and src1/dir1/different-file.txt differ
Files src0/dir1/empty-file.txt and src1/dir1/empty-file.txt are identical
Files src0/dir1/same-file.txt and src1/dir1/same-file.txt are identical

一般にこの問題を解決するには rsync を使う。ただし、--checksum を使っているので、巨大ファイルが含まれる場合にかなり遅くなる。

$ mkdir dst
$ rsync -avv --checksum --backup --backup-dir=/full/path/to/dst_bkup/ /full/path/to/src0/ /full/path/to/dst
backup_dir is /full/path/to/dst_bkup/
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
dir0/
dir0/different-file.txt
dir0/empty-file.txt
dir0/same-file.txt
dir1/
dir1/different-file.txt
dir1/empty-file.txt
dir1/same-file.txt
total: matches=0  hash_hits=0  false_alarms=0 data=72

sent 647 bytes  received 137 bytes  1568.00 bytes/sec
total size is 72  speedup is 0.09
$ rsync -avv --checksum --backup --backup-dir=/full/path/to/dst_bkup/ /full/path/to/src1/ /full/path/to/dst
backup_dir is /full/path/to/dst_bkup/
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
dir0/different-file.txt
dir0/empty-file.txt is uptodate
dir0/same-file.txt is uptodate
dir1/different-file.txt
dir1/empty-file.txt is uptodate
dir1/same-file.txt is uptodate
backed up dir0/different-file.txt to /full/path/to/dst_bkup/dir0/different-file.txt
backed up dir1/different-file.txt to /full/path/to/dst_bkup/dir1/different-file.txt
total: matches=0  hash_hits=0  false_alarms=0 data=62

sent 485 bytes  received 73 bytes  1116.00 bytes/sec
total size is 72  speedup is 0.13
$ find . | sort
.
./dst
./dst_bkup
./dst_bkup/dir0
./dst_bkup/dir0/different-file.txt
./dst_bkup/dir1
./dst_bkup/dir1/different-file.txt
./dst/dir0
./dst/dir0/different-file.txt
./dst/dir0/empty-file.txt
./dst/dir0/same-file.txt
./dst/dir1
./dst/dir1/different-file.txt
./dst/dir1/empty-file.txt
./dst/dir1/same-file.txt
./src0
./src0/dir0
./src0/dir0/different-file.txt
./src0/dir0/empty-file.txt
./src0/dir0/same-file.txt
./src0/dir1
./src0/dir1/different-file.txt
./src0/dir1/empty-file.txt
./src0/dir1/same-file.txt
./src1
./src1/dir0
./src1/dir0/different-file.txt
./src1/dir0/empty-file.txt
./src1/dir0/same-file.txt
./src1/dir1
./src1/dir1/different-file.txt
./src1/dir1/empty-file.txt
./src1/dir1/same-file.txt

上に挙げた rsync --checksum は一般的な解決策ではあるが、移動元と移動先のすべてのファイルに対してチェックサム計算が行われ、移動元と移動先に同じファイル名で同じチェックサムで同じファイルサイズのファイルがあれば移動されない。つまり、移動元と移動先に同じファイル名をもつファイルが含まれない場合はすべてのチェックサム計算は無駄になる。ということは、src1 と src0 に含まれるディレクトリ以外の要素の相対パスを比較して、両者に共通の相対パスを持つ要素がないことが確認できればチェックサム計算は不要。

$ comm -12 <(find ./src1/ ! -type d -printf "%P\n" | sort) <(find ./src0/  ! -type d -printf "%P\n" | sort)

src0 を dst に移動。これで src0 に残るものは何もなくなる。

$ mkdir -p ./dst
$ mv -v -T ./src0/ ./dst
`./src0/' -> `./dst'
$ find . | sort
.
./dst
./dst/dir0
./dst/dir0/different-file.txt
./dst/dir0/empty-file.txt
./dst/dir0/only-in-src0-file.txt
./dst/dir0/same-file.txt
./dst/dir1
./dst/dir1/different-file.txt
./dst/dir1/empty-file.txt
./dst/dir1/only-in-src0-file.txt
./dst/dir1/same-file.txt
./src1
./src1/dir0
./src1/dir0/different-file.txt
./src1/dir0/empty-file.txt
./src1/dir0/only-in-src1-file.txt
./src1/dir0/same-file.txt
./src1/dir1
./src1/dir1/different-file.txt
./src1/dir1/empty-file.txt
./src1/dir1/only-in-src1-file.txt
./src1/dir1/same-file.txt

src1 に含まれて dst に含まれないファイルだけを dst に移動。これで src1 に残るものは dst に含まれるファイルと同じ相対パスを持つファイルだけになる。ここで src1 にファイルが残らなければ終了してかまわない。チェックサム計算がなくなるので速度は改善。

$ rsync -avv --remove-source-files --ignore-existing ./src1/ ./dst
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
dir0/different-file.txt exists
dir0/empty-file.txt exists
dir0/only-in-src1-file.txt
dir0/same-file.txt exists
dir1/different-file.txt exists
dir1/empty-file.txt exists
dir1/only-in-src1-file.txt
dir1/same-file.txt exists
sender removed dir0/only-in-src1-file.txt
sender removed dir1/only-in-src1-file.txt
total: matches=0  hash_hits=0  false_alarms=0 data=0

sent 337 bytes  received 61 bytes  796.00 bytes/sec
total size is 72  speedup is 0.18
$ find . | sort
.
./dst
./dst/dir0
./dst/dir0/different-file.txt
./dst/dir0/empty-file.txt
./dst/dir0/only-in-src0-file.txt
./dst/dir0/only-in-src1-file.txt
./dst/dir0/same-file.txt
./dst/dir1
./dst/dir1/different-file.txt
./dst/dir1/empty-file.txt
./dst/dir1/only-in-src0-file.txt
./dst/dir1/only-in-src1-file.txt
./dst/dir1/same-file.txt
./src1
./src1/dir0
./src1/dir0/different-file.txt
./src1/dir0/empty-file.txt
./src1/dir0/same-file.txt
./src1/dir1
./src1/dir1/different-file.txt
./src1/dir1/empty-file.txt
./src1/dir1/same-file.txt

dst に含まれるファイルと同じ相対パスを持つ src1 に含まれるファイルのチェックサムを確認しながら移動。チェックサムが同じ場合は src1 に含まれるファイルを削除、異なる場合は dst に含まれるファイルを dst_bkup に移動、src1 に含まれるファイルを dst に移動。

$ rsync -avv --remove-source-files --existing --checksum --backup --backup-dir=/full/path/to/dst_bkup/ ./src1/ ./dst
backup_dir is /full/path/to/dst_bkup/
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
dir0/
dir0/different-file.txt
dir0/empty-file.txt is uptodate
sender removed dir0/empty-file.txt
dir0/same-file.txt is uptodate
sender removed dir0/same-file.txt
dir1/
dir1/different-file.txt
dir1/empty-file.txt is uptodate
sender removed dir1/empty-file.txt
dir1/same-file.txt is uptodate
sender removed dir1/same-file.txt
backed up dir0/different-file.txt to /full/path/to/dst_bkup/dir0/different-file.txt
sender removed dir0/different-file.txt
backed up dir1/different-file.txt to /full/path/to/dst_bkup/dir1/different-file.txt
sender removed dir1/different-file.txt
total: matches=0  hash_hits=0  false_alarms=0 data=62

sent 469 bytes  received 73 bytes  361.33 bytes/sec
total size is 72  speedup is 0.13
$ find . | sort
.
./dst
./dst_bkup
./dst_bkup/dir0
./dst_bkup/dir0/different-file.txt
./dst_bkup/dir1
./dst_bkup/dir1/different-file.txt
./dst/dir0
./dst/dir0/different-file.txt
./dst/dir0/empty-file.txt
./dst/dir0/only-in-src0-file.txt
./dst/dir0/only-in-src1-file.txt
./dst/dir0/same-file.txt
./dst/dir1
./dst/dir1/different-file.txt
./dst/dir1/empty-file.txt
./dst/dir1/only-in-src0-file.txt
./dst/dir1/only-in-src1-file.txt
./dst/dir1/same-file.txt
./src1
./src1/dir0
./src1/dir1

リファレンス

  1. different behavior for --backup-dir relative/path vs --backup-dir /full/path

ソーシャルブックマーク

  1. はてなブックマーク
  2. Google Bookmarks
  3. del.icio.us

ChangeLog

  1. Posted: 2008-10-07T17:32:04+09:00
  2. Modified: 2008-10-07T17:32:04+09:00
  3. Generated: 2016-12-23T23:09:19+09:00