Linux - From The Terminal: December 2008

Hey there, this site has moved, so comments are disabled. Thankfully, you can go to the page, carefully linked for your satisfaction. Click here to go there.

wget is a command-line, non-interactive html grabber. I have used it in the past in some scripts to download an html file, check it for certain data, and run a different script based on what the first html said. Here I will teach you some tricks to using it:

In it's most basic form, you type: wget http://www.the_web_site.com/the_file.html

The output is like this:


user@user-desktop:~$ wget http://tobiasdavis.110mb.com/index.php
--00:43:08--  http://tobiasdavis.110mb.com/index.php
         => `index.php'
Resolving tobiasdavis.110mb.com... 195.242.99.215
Connecting to tobiasdavis.110mb.com|195.242.99.215|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4,566 (4.5K) [text/html]

100%[=====================================================================================================================================>] 4,566         24.72K/s          

00:43:09 (24.67 KB/s) - `index.php' saved [4566/4566]

user@user-desktop:~$

This grabbed the file index.php from the website and saves it in the current directory as index.php

You probably noticed that I didn't need to use an html file in wget. Thankfully, wget has a lot of tricks you can use, and you don't need to use html. You can use wget to grab zip, pdf, mp3, or pretty much any file. Most of the options I will show you are put in this way: wget [the options] [the file]

The first one that I wanted to know was how to stop printing all that extra stuff. You can easily make wget run "quietly" by using the -q (for quiet) option: wget -q http://tobiasdavis.110mb.com/index.php This does the same thing as the first one, but doesn't tell you what it's doing.

Another handy thing, especially if you are scripting things, is to download a file and name it something else. This is also pretty easy: Use the -O file.html option to save it as file.html

Like this: wget -q -O index.html http://tobiasdavis.110mb.com/index.php
This quietly downloads the same file as before, and saves it as index.html

By now you may have noticed that there are multiple files in the folder you have been running these commands. By now I had three named index.php index.php.1 index.php.2 but what does it mean? This is a handy feature of wget: If you download a file with the same name, it won't automatically overwrite the old one. But what if you want to?

You can always tell wget what file name to use, as above using the -O file option. When you do this, wget will overwrite the old version. In normal use, use the -N option to overwrite the old file. Like this: wget -q -N http://tobiasdavis.110mb.com/index.php This downloads the same file, but overwrites the old one you might still have.

Some other things that are useful are using ftp instead of http. It is the exact same thing, except you probably have a user name and password. The wget command is wget ftp://username:password@host/path

I used it like this: wget ftp://user:password@tobiasdavis.110mb.com/index.php The user name and password are mine. This downloads the actual underlying php script, which is quite different from what you see when browsing or downloading from http.

One note of security: If you use the above system, any other users on the system can read your log-in information, which is especially bad on a multi-user computer! Instead, type it in like this: wget -i - Then, type in the ftp command as above, like: ftp://user:password@host/path Now press Enter and then hit the Control and D button. This lets you type in things and they won't be visible to other users. In practicality, wget is usually used in scripts, where you can do things differently.

You can look here for the official manual, some examples here, or try looking online for "wget examples". Next time I will show you something about curl which is a more powerful version.

Hey there, this site has moved, so comments are disabled. Thankfully, you can go to the page, carefully linked for your satisfaction. Click here to go there.

What with this being my ninth blog, and something like my fifteenth web-site altogether, perhaps it deserves an explanation:

I like the power of Linux, and I especially like the use of command prompts to make things happen. When I type, I use pico. When I run commands, I open a terminal. Especially since I learned perl scripting, a simple type of programming things to happen from the command-line, I have really gotten into using the command line.

On the one side, I really like the terminal. On the other side, there are so many commands to remember that I decided to create this blog as a reference, so that I could just go back here if I forget something.. This could, I hope, be beneficial to you as well: You can peruse and search what I have done and see if it is useful for your situation as well. I will try to make it simple enough to follow, let me know if it needs simplification.

Two things:
1) I don't do Windows. I can't stand Windows at all. If I could never use Windows again I would be so very happy, and in fact this is my plan. NONE of these commands are for Windows! Don't use them there! Alternately, in my painful use of Windows, I have learned many things which I might share if I can recall them when appropriate.
2) I am a full-time student in an Engineering degree (e.g., I don't have much time) and this shows up in my other under-developed blogs. You can feel free to ask me questions, and I might be able to answer them, but don't be surprised if your question gets ignored for a long time: I am probably busy.

Hope this is as useful for you as it is for me!

Linux - From The Terminal

MOVING!!!

December 24, 2008

wget: Grabbing html files.

December 16, 2008

What is this about?

Who I Am?

Followers

Blog Archive