If you’ve ever had to move a huge directory containing many files from one server to another, you may have encountered a situation where the copy rate was significantly less that what you’d expect your network could support. Rsync does a fantastic job of quickly syncing two relatively similar directory structures, but the initial clone can take quite a while, especially as the file count increases.

The problem is that there is a certain amount of per-file overhead when using scp or rsync to copy files from one machine to the other. This is not a problem under most circumstances, but if you are attempting to duplicate tens of thousands of files (think, server or database backup), this per-file overhead can really add up. The solution is to copy the files over in a single stream, which normally means tarring them up on one server, copying the tarball, then untarring on the destination. Unless you are under 50% disk utilization on the source server, this could cause you to run out of space.

Brett Jones has an alternative solution, which uses the handy netcat utility:

After clearing up 10 GBs of log files, we were left with hundreds of thousands of small files that were going to slow us down. We couldn’t tarball the file because of a lack of space on the source server. I started searching around and found this nifty tip that takes our encryption and streams all the files as one large file:

This requires netcat on both servers.

Destination box: nc -l -p 2342 | tar -C /target/dir -xzf –
Source box: tar -cz /source/dir | nc Target_Box 2342

This causes the source machine to tar the files up and send them over the netcat pipe, where they are extracted on the destination machine, all with no per-file negotiation or unnecessary disk space used. It’s also faster than the usual scp or rsync over scp because there is no encryption overhead. If you are on a local protected network, this will perform much better, even for large single-file copies.

If you are on an unprotected network, however, you may still want your data encrypted in transit. You can perform about the same task over ssh:

Run this on the destination machine:
cd /path/to/extract/to/
ssh [email protected] ‘tar -cz -C /source/path/ *’ | tar -zxv

This command will issue the tar command across the network on the source machine, causing tar’s stdout to be sent back over the network. This is then piped to stdin on the destination machine and the files magically appear in the directory you are currently in.

The ssh route is a little slower than using netcat, due to the encryption overhead, but it’s still way faster than scping the files individually. It also has the added advantage of potentially being compatible with Windows servers, provided you have a few of the unix tools like ssh and tar installed on your Windows server (using the cygwin linked binaries that are available).

Fast File Copy – Linux!