Posted by Sir Four at 4:41pm Jan 31 '05
You must sign in to send Sir Four a message
You must sign in to send Sir Four a message
SQL, or Structured Query Language, is the standard language you use for interacting with a database. SQL is pronounced like "sequel", and is a powerful tool in the programmer's belt. I'll give you guys an intro here.
First, we must review some relational database concepts. Data is stored in tables, which consist of rows and columns. Columns are data fields and each row represents one record in the table. For example, consider a table called "post" that has the following columns: post_id, post_date, title, username. A few rows in this table might look like this:
You get the idea. There could be thousands, maybe millions of records in this table. Okay, let's write a query in SQL to get the data. We'll use a SELECT statement. SELECT statements are like asking the database a question. Here's one:
SELECT post_id, title, username FROM post WHERE username = "Fred";
Straight-forward, right? You're asking the database to give you the data in the fields "post_id, title, and username" from the table "post", but only the records where the username is "Fred". So you might get back something like:
Let's add something to our SQL query:
SELECT post_id, title, username FROM post WHERE username = "Fred" ORDER BY title;
This time we specified that we want the data sorted by title, so we get:
You can imagine constructing all sorts of queries, for instance getting a list of the most recent posts. If we want to see the ten most recent posts we might try:
SELECT post_id, title, username FROM post ORDER BY post_date DESC LIMIT 10;
You can see I'm sorting by post_date and specifically stating that I want it in descending order. Furthermore, I want what the database returns to me to be limited to 10 records. Notice I took out the WHERE username = "Fred" part. If I wanted only Fred's most recent posts, I could have left it in there. By removing that, I'll get the ten most recent posts of all users, not just Fred.
The real power of a relational database comes into play when joining multiple tables together. We've already used the "post" table as an example. Let's say that the "post" table also has a field called "board_id". Furthermore, let's say we have another table that stores a list of all the boards. It looks something like:
The board_id column is refered to as the "primary key" of the "board" table. A primary key is a unique identifier that can be used to reference a specific row in the table. As I said, we've added a board_id column to the "post" table as well. This column is refered to as a "foreign key" which means it points to the primary key of another table. The database will actually enforce this rule and will not let you create a record in the "post" table with a value for "board_id" that does not exist in the "board" table. Let's see how we can use this key:
Say you want to see a list of the ten most recent posts again, but this time you want to know which board each post is under. The name of the board is stored in a separate table, so how do we join that information together? Simple:
SELECT a.post_id, a.title, a.username, b.name FROM post a, board b WHERE a.board_id = b.board_id ORDER BY post_date DESC LIMIT 10;
What did I do here? Notice that in the "FROM" section there are now two tables, "post" and "board". Since there is more than one, we need to give them aliases so elsewhere in our query we know which columns belong to which table. I've given the alias "a" to the "post" table and "b" to the "board" table. Notice that in addition to selecting several fields from the "post" table I am now also selecting "name" from the "board" table. Also notice the "WHERE" part of the query. I'm telling the database to link up these two tables based on the "board_id" column. What the database will do is go through each resulting record in the "post" table and find the record in the "board" table that matches with that specific "board_id". The result will look something like:
And so on, you get the idea.
Now let's say I want to count records. Try this SQL:
SELECT COUNT(*) FROM post WHERE board_id = 2;
This returns:
"COUNT(*)" is a special command in SQL that returns a count of the number of records. Notice that I put "WHERE board_id = 2." I'm counting the number of posts on the Pets board, which is board_id 2. As we see from what the database returned, there are 162 posts in total on the Pets board.
So this has been your introduction to SQL. Obviously these are very simple queries, but now you have an idea of how a modern relational database works and how one asks it questions to get meaningful data back. You can write these SQL statements directly into a program that interfaces with the database, or if you're writing code there are interfaces available for all major programming languages. 4thKingdom uses Java, and it's easy to execute these queries within the Java code and loop through the resulting data that the database sends back. 4thKingdom uses the PostgreSQL database, which is a powerful, advanced, and free database for anyone to download and use. In another post on this board, I announced that PostgreSQL now runs on Windows as well as UNIX-based systems, so there is no excuse for you not to download it and play around.
Of course I've left out how to INSERT the data into the database or UPDATE it once it's there, but that's a lesson for another time. And of course there are many considerations to make, such as how to request the data in a way that has good performance. For example, the SQL that gets the data for the "my posts" page started becoming very slow as the number of records in the tables increased, so I had to make some changes to speed it up. This is known as "optimizing" your SQL. There are also questions of how best to set up your database, how much memory to give it, what disk drive(s) to assign it to store the data, and so on, but these are very advanced topics and probably not of any interest to a database beginner.
My advice is to go for it and learn SQL, download a database and start coding away! You can build your own data-driven website if you put your mind to it!
First, we must review some relational database concepts. Data is stored in tables, which consist of rows and columns. Columns are data fields and each row represents one record in the table. For example, consider a table called "post" that has the following columns: post_id, post_date, title, username. A few rows in this table might look like this:
post_id | post_date | title | username |
500 | May 5, 10:15am | Hey, check out this pic of my dog | Bob |
501 | May 5, 11:02am | I disagree! | George |
502 | May 5, 11:08am | Time to make the donuts | Fred |
You get the idea. There could be thousands, maybe millions of records in this table. Okay, let's write a query in SQL to get the data. We'll use a SELECT statement. SELECT statements are like asking the database a question. Here's one:
SELECT post_id, title, username FROM post WHERE username = "Fred";
Straight-forward, right? You're asking the database to give you the data in the fields "post_id, title, and username" from the table "post", but only the records where the username is "Fred". So you might get back something like:
post_id | title | username |
122 | Yeah whatever | Fred |
342 | I think you're wrong | Fred |
502 | Time to make the donuts | Fred |
Let's add something to our SQL query:
SELECT post_id, title, username FROM post WHERE username = "Fred" ORDER BY title;
This time we specified that we want the data sorted by title, so we get:
post_id | title | username |
342 | I think you're wrong | Fred |
502 | Time to make the donuts | Fred |
122 | Yeah whatever | Fred |
You can imagine constructing all sorts of queries, for instance getting a list of the most recent posts. If we want to see the ten most recent posts we might try:
SELECT post_id, title, username FROM post ORDER BY post_date DESC LIMIT 10;
You can see I'm sorting by post_date and specifically stating that I want it in descending order. Furthermore, I want what the database returns to me to be limited to 10 records. Notice I took out the WHERE username = "Fred" part. If I wanted only Fred's most recent posts, I could have left it in there. By removing that, I'll get the ten most recent posts of all users, not just Fred.
The real power of a relational database comes into play when joining multiple tables together. We've already used the "post" table as an example. Let's say that the "post" table also has a field called "board_id". Furthermore, let's say we have another table that stores a list of all the boards. It looks something like:
board_id | name |
1 | Computers |
2 | Pets |
3 | Politics |
4 | Religion |
The board_id column is refered to as the "primary key" of the "board" table. A primary key is a unique identifier that can be used to reference a specific row in the table. As I said, we've added a board_id column to the "post" table as well. This column is refered to as a "foreign key" which means it points to the primary key of another table. The database will actually enforce this rule and will not let you create a record in the "post" table with a value for "board_id" that does not exist in the "board" table. Let's see how we can use this key:
Say you want to see a list of the ten most recent posts again, but this time you want to know which board each post is under. The name of the board is stored in a separate table, so how do we join that information together? Simple:
SELECT a.post_id, a.title, a.username, b.name FROM post a, board b WHERE a.board_id = b.board_id ORDER BY post_date DESC LIMIT 10;
What did I do here? Notice that in the "FROM" section there are now two tables, "post" and "board". Since there is more than one, we need to give them aliases so elsewhere in our query we know which columns belong to which table. I've given the alias "a" to the "post" table and "b" to the "board" table. Notice that in addition to selecting several fields from the "post" table I am now also selecting "name" from the "board" table. Also notice the "WHERE" part of the query. I'm telling the database to link up these two tables based on the "board_id" column. What the database will do is go through each resulting record in the "post" table and find the record in the "board" table that matches with that specific "board_id". The result will look something like:
post_id | title | username | name |
5654 | Hey look at this | Bob | Pets |
5653 | I don't think so! | Mark | Politics |
5652 | That's a cute cat | Bob | Pets |
5651 | My computer broke | Smitty | Computers |
And so on, you get the idea.
Now let's say I want to count records. Try this SQL:
SELECT COUNT(*) FROM post WHERE board_id = 2;
This returns:
COUNT(*) |
162 |
"COUNT(*)" is a special command in SQL that returns a count of the number of records. Notice that I put "WHERE board_id = 2." I'm counting the number of posts on the Pets board, which is board_id 2. As we see from what the database returned, there are 162 posts in total on the Pets board.
So this has been your introduction to SQL. Obviously these are very simple queries, but now you have an idea of how a modern relational database works and how one asks it questions to get meaningful data back. You can write these SQL statements directly into a program that interfaces with the database, or if you're writing code there are interfaces available for all major programming languages. 4thKingdom uses Java, and it's easy to execute these queries within the Java code and loop through the resulting data that the database sends back. 4thKingdom uses the PostgreSQL database, which is a powerful, advanced, and free database for anyone to download and use. In another post on this board, I announced that PostgreSQL now runs on Windows as well as UNIX-based systems, so there is no excuse for you not to download it and play around.
Of course I've left out how to INSERT the data into the database or UPDATE it once it's there, but that's a lesson for another time. And of course there are many considerations to make, such as how to request the data in a way that has good performance. For example, the SQL that gets the data for the "my posts" page started becoming very slow as the number of records in the tables increased, so I had to make some changes to speed it up. This is known as "optimizing" your SQL. There are also questions of how best to set up your database, how much memory to give it, what disk drive(s) to assign it to store the data, and so on, but these are very advanced topics and probably not of any interest to a database beginner.
My advice is to go for it and learn SQL, download a database and start coding away! You can build your own data-driven website if you put your mind to it!