Parallel command execution
Say you want to download a list of files with a command line tool. Since each of those files is quite large, and (as it is the case with me) running only one download process won’t fully utilize your internet connection, you could write a script that executes n download processes in parallel, waits until a process has finished, and then starts the next one.
Or just use xargs
. xargs
(at least) on Linux, FreeBSD and OS X supports the -P
argument, allowing it to run the given program in parallel. You just pass it the number of processes it is allowed to create, and off it goes.
Practical example: I wanted to download all the videos made by Confreaks from the Mountain West Ruby Conf 2009 (Beats me why they don’t offer a torrent with all the videos). First I extracted a list of all those videos with Hpricot on IRb. Then, I simply passed the list to xargs
, resulting in the following command (I’ve shortened the list of videos for this post):
First, I tell xargs to run a maximum of 4 processes in parallel (-P4
). The -t
flag causes it to echo the command it is about to run (useful for having an idea about the progress). Then I limit the number of arguments each command will get to one. xargs
would otherwise use its default (5000 on OS X), which would mean the first command would get all the arguments, and the other three wouldn’t even be run. With each process getting only one argument xargs
will immediately start up the next one as soon as a download is finished.
Finally, I use wget -qc
(Be quiet and continue downloading if I already have a part of the file) to actually download the files.
<<<
is another neat trick I recently discovered via Command-Line-Fu. Basically it allows you to pass a string via STDIN to a command without resorting to the (ugly) echo 'string' | command
:
$ echo lol | cat
lol
$ <<< lol cat
lol
$ cat <<< lol
lol
$ cat <<< 'lol lol'
lol lol
It should work with bash and zsh, though I can only vouch for the former.