scp-resume for downloading multiple files
You need to transfer a lot of files across a slightly temperamental ssl connection. You want something like a recursive scp command that supports resuming and will keep on trying until it gets the job done.
rsync is ideal for this purpose – however, I find it quite dodgy under cygwin, especially when transferring large files.
A sweet alternative is Unison, for synchronizing filesets over ssh.
However, I often find myself falling back on a nice script called scp-resume.sh designed for resuming the transfer of large files using dd over ssh. We can invoke this script inside a loop to transfer lots of files at a time.
One problem with the script is the use of the construct below to determine file sizes:
localsize=`ls -l "${localfile}" | awk '{ print $5 }'`
This will fail if there are spaces in the username of the file owner. Most likely you’ll get:
Resuming download of [file] at byte None ... dd: invalid number `None'
where the group owning the file is reported by cygwin as ‘None’. The fix is to replace every instance of this ls -l
construct with something like localsize=`ls -g "${localfile}" | awk '{ print $4 }'`
. The -g
option displays the file size but not the owner name, so you should be safe from spaces confusing awk. I don’t know if the -g option is POSIX, but it’s in GNU ls anyway.
You might be tempted to use ls -s
, but this reports the amount of disk space used, rather than the actual length of the file (i.e. it will be a multiple of the allocation blocks). You can see the difference using ls -ls
:
Hugh Denman@gpplap3 ~ $ cat > asd.txt fre hschui huernui Hugh Denman@gpplap3 ~ $ ls -ls --block-size=1 ./asd.txt 1024 -rw-r--r-- 1 Hugh Denman None 19 Mar 28 17:59 ./asd.txt
Here my 19-byte text file is taking up 1024 bytes of disk space.
Two other possibilities, suggested by Erik Jan Taal, are perl -e "print -s '$filename'"
and ls -l | sed -n 's/.* [^0-9]*\([0-9]\+\) .*/\1/ p'
. These will work on FreeBSD, for example, which does not support ls -g
.
To use the scp-resume
script, we’ll need a text file containing the filenames to transfer from the remote machine. Here’s one way to generate this list.
$ ssh remote-user@remote.machine.ip.addr "/bin/find /cygdrive/d -type f" | grep -vi i386 > ./filelist.txt
In this example, the remote drive contains the OS installation files in /cygdrive/d/I386
, which we don’t want to transfer.
With a fixed scp-resume
script, and the list of files to transfer present, all that’s left to do is iterate over each file in the list and tell scp-resume to download it. We use the cat filelist.txt | while read FILE
approach because it will preserve spaces in the filename (unlike for file in `cat filelist.txt`
).
cat filelist.txt | while read FILE ; do
DIR=`dirname "$FILE"`;
mkdir -p "./$DIR" ;
./scp-resume.sh -d "remote-user@remote.machine.ip.addr
:$FILE" "./$FILE" ;
done
This very nearly works – the only trouble is that it will only transfer the first file in the list, and then inexplicably stops without an error! This is a difficulty that arises whenever you use the cat [file] | while read VAR
idea, with a shell invocation inside the while loop: whenever a shell is started, it gets STDIN, which kills the pipe (I found that out in a Usenet post). So we have to modify scp-resume
one last time, changing the download command
ssh -C -c arcfour "$userhost" "dd bs=1 skip=$localsize \"if=${remotefile}\"" >> $localfile < /dev/null
With this change, you can’t enter the ssh password in manually – but you’d have to have automatic authentication setup anyway really, as you don’t want to enter your password for every file. A simple way to set up automatic authentication is described here.
Lastly, you can wrap the whole command above in a for loop with a few iterations so that if the connection is dropped on a few transfers, the file can be resumed in a subsequent pass:
for i in `seq 0 100`; do
cat filelist.txt | while read FILE ; do
DIR=`dirname "$FILE"`; mkdir -p "./$DIR" ;
./scp-resume.sh -d "remote-user@remote.machine.ip.addr
:$FILE" "./$FILE" ;
done; done
This whole process is hideously inefficient for large numbers of files, alas. But it seems to get the job done. Here’s my edited version of scp-resume, using the redirect from /dev/null for ssh and using ls -g
instead of ls -l
to query the file size. Note that I’ve only tested the downloading functionality, never the uploading bits.
#!/bin/sh # # scp-resume - by erik jan taal # http://ejtaal.net/scripts-showcase/#scp-resume # Speed improvements by using blocks by nitro.tm@gmail.com # Fixed by Hugh Denman to use ls -g (safe with usernames containing spaces) # this versions assumes that ssh is setup for automatic authentication rather than manual password entry # # This script assumes that you have access to the 'dd' utility # on both the local and remote host. # dd transfer blocksize (8192 by default) blocksize=8192 usage() { echo echo "Usage: `basename $0` -u(pload) $localfile $remotefile [$sshargs]" echo " `basename $0` -d(ownload) $remotefile $localfile [$sshargs]" echo echo " $remotefile should be in the scp format, i.e.: [user@]host:filename" echo " $sshargs are option further ssh options such as a port specification" echo " (-p 1234) or use of compression (-C)" echo echo " -u:" echo " $remotefile may be [user@]host: for uploading to your remote home directory" echo " -d:" echo " $localfile may be a period (.) when downloading a remote file to the" echo " current working directory." echo exit 1 } [ -z "$1" -o -z "$2" -o -z "$3" ] && usage option=$1 case $option in -[uU]*) localfile=$2 remote=$3 shift 3 sshargs="$*" userhost=${remote%:*} remotefile=${remote#*:} if [ ! -f "$localfile" ]; then echo "!! File not found: $localfile" usage fi if [ x"$userhost" = x"$remote" ]; then usage; fi if [ x"$remotefile" = x"$remote" -o -z "$remotefile" ]; then remotefile=`basename "$localfile"`; fi echo "==>> Getting size of remote file:" localsize=`ls -g "${localfile}" | awk '{ print $4 }'` remotesize=`ssh $sshargs "$userhost" "[ -f \"${remotefile}\" ] && ls -g \"${remotefile}\"" | awk '{ print $4 }' < /dev/null` [ -z "$remotesize" ] && remotesize=0 echo "=> Remote filesize: $remotesize bytes" if [ $localsize -eq $remotesize ]; then echo "=> Local size equals remote size, nothing to transfer." exit 0; fi remainder=$((remotesize % blocksize)) restartpoint=$((remotesize - remainder)) blockstransferred=$((remotesize / blocksize)) echo "=> Resuming upload of '$localfile'" echo " at byte: $restartpoint ($blockstransferred blocks x $blocksize bytes/block)," echo " will overwrite the trailing $remainder bytes." dd bs=$blocksize skip=$blockstransferred "if=${localfile}" | ssh $sshargs "$userhost" "dd bs=$blocksize seek=$blockstransferred of=\"$remotefile\"" < /dev/null echo "done." ;; -[dD]*) localfile=$3 remote=$2 shift 3 sshargs="$*" userhost=${remote%:*} remotefile=${remote#*:} if [ x"$localfile" = x"." ]; then localfile=`basename "$remotefile"`; fi if [ ! -f "$localfile" ]; then localsize=0; else localsize=`ls -g "${localfile}" | awk '{ print $4 }'` fi [ x"$remotefile" = x"$remote" ] && usage [ -z "$localsize" ] && localsize=0 remainder=$((localsize % blocksize)) restartpoint=$((localsize - remainder)) blockstransferred=$((localsize / blocksize)) echo "=> Resuming download of '$localfile'" echo " at byte: $restartpoint ($blockstransferred blocks x $blocksize bytes/block)" echo " filesize: $localsize; will overwrite the trailing $remainder bytes." ssh $sshargs "$userhost" "dd bs=$blocksize skip=$blockstransferred \"if=${remotefile}\"" < /dev/null | dd bs=$blocksize seek=$blockstransferred "of=$localfile" ;; *) usage ;; esac
Second real post exactly one year after the first! Prolific.