MOVING!!!

Hey there, after several years of playing with Blogger and trying to get thing sorted out how I wanted, I finally decided to host my own domain so I could manage things easier. So this blog (which, admittedly, hasn't been updated much) is getting moved to my all-new site: DavisTobias.com/Linux. Also, to make it easier to transfer RSS feeds, this is the link to the new RSS feed. I'll leave this site and it's posts up, so I don't contribute to dead links on the internet, but I'm shutting off comments and won't post any more here.

December 24, 2008

wget: Grabbing html files.

Hey there, this site has moved, so comments are disabled. Thankfully, you can go to the page, carefully linked for your satisfaction. Click here to go there.

wget
is a command-line, non-interactive html grabber. I have used it in the past in some scripts to download an html file, check it for certain data, and run a different script based on what the first html said. Here I will teach you some tricks to using it:

In it's most basic form, you type: wget http://www.the_web_site.com/the_file.html

The output is like this:
user@user-desktop:~$ wget http://tobiasdavis.110mb.com/index.php
--00:43:08-- http://tobiasdavis.110mb.com/index.php
=> `index.php'
Resolving tobiasdavis.110mb.com... 195.242.99.215
Connecting to tobiasdavis.110mb.com|195.242.99.215|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4,566 (4.5K) [text/html]

100%[=====================================================================================================================================>] 4,566 24.72K/s

00:43:09 (24.67 KB/s) - `index.php' saved [4566/4566]

user@user-desktop:~$


This grabbed the file index.php from the website and saves it in the current directory as index.php

You probably noticed that I didn't need to use an html file in wget. Thankfully, wget has a lot of tricks you can use, and you don't need to use html. You can use wget to grab zip, pdf, mp3, or pretty much any file. Most of the options I will show you are put in this way: wget [the options] [the file]

The first one that I wanted to know was how to stop printing all that extra stuff. You can easily make wget run "quietly" by using the -q (for quiet) option: wget -q http://tobiasdavis.110mb.com/index.php This does the same thing as the first one, but doesn't tell you what it's doing.

Another handy thing, especially if you are scripting things, is to download a file and name it something else. This is also pretty easy: Use the -O file.html option to save it as file.html

Like this: wget -q -O index.html http://tobiasdavis.110mb.com/index.php
This quietly downloads the same file as before, and saves it as index.html

By now you may have noticed that there are multiple files in the folder you have been running these commands. By now I had three named index.php index.php.1 index.php.2 but what does it mean? This is a handy feature of wget: If you download a file with the same name, it won't automatically overwrite the old one. But what if you want to?

You can always tell wget what file name to use, as above using the -O file option. When you do this, wget will overwrite the old version. In normal use, use the -N option to overwrite the old file. Like this: wget -q -N http://tobiasdavis.110mb.com/index.php This downloads the same file, but overwrites the old one you might still have.

Some other things that are useful are using ftp instead of http. It is the exact same thing, except you probably have a user name and password. The wget command is wget ftp://username:password@host/path

I used it like this: wget ftp://user:password@tobiasdavis.110mb.com/index.php The user name and password are mine. This downloads the actual underlying php script, which is quite different from what you see when browsing or downloading from http.

One note of security: If you use the above system, any other users on the system can read your log-in information, which is especially bad on a multi-user computer! Instead, type it in like this: wget -i - Then, type in the ftp command as above, like: ftp://user:password@host/path Now press Enter and then hit the Control and D button. This lets you type in things and they won't be visible to other users. In practicality, wget is usually used in scripts, where you can do things differently.

You can look here for the official manual, some examples here, or try looking online for "wget examples". Next time I will show you something about curl which is a more powerful version.