Discover the physical objects and elements used for data storage on the SQL Server platform in this excerpt from "A Developer's Guide to Data Modeling for SQL Server, Covering SQL Server 2005 and 2008." You'll learn about the various data types utilized, how to enforce data integrity with primary and foreign keys, and the proper steps for working with coded stored procedures and parameters. Authors Eric Johnson and Joshua Jones also describe how supertype and subtype tables operate in SQL Server.
Physical Elements of Data Models
Now that you have a grasp of the logical elements used to construct a data
model, let's look at the physical elements. These are the objects that you
use to build the database. Most of the objects you build into your physical
model are based on objects you created in the logical model. Many physical
elements are the same no matter which RDBMS you are using, but we
look at all the elements available in SQL Server 2008. It is important to
know SQL Server's capabilities so that you can build your model with them
in mind.
In this chapter, we cover all the physical SQL Server objects in detail
and walk you through how to use each type of object in your physical
model. You will use these elements later in Chapter 9.
Physical Storage
First, we'll start with the objects that allow you to store data in your database.
You'll build everything else on these objects. Specifically, these are
tables, views, and data types.
Tables
Tables are the building blocks on which relational databases are built.
Underneath everything else, all data in your database ends up in a table.
Tables are made up of rows and columns. Like a single instance in an entity,
each row stores information pertaining to a single record. For example,
in an employee table, each row would store the information for a
single employee.
The columns in the table store information about the rows in the table.
The FirstName column in the Employee table would store the first names of all the employees. Columns map to attributes from your logical model,
and, like the logical model, each column has a data type assigned. Later in
this chapter we look at the SQL Server data types in detail.
When you add data to a table, each column must either contain data
(even if it is an empty string) or specify a NULL value, NULL being the
complete absence of data. Additionally, you can specify that each column
have a default value. The default value is used if you add data without
specifying a value for that column. A default can be a fixed value, such as
always setting a numeric column to the value of 12, or it can be a function
that returns a value of the appropriate data type. If you do not have a default
value specified and you insert data without specifying a value for a
column, SQL Server attempts to insert a NULL value. If the column does
not allow NULL values, your insert will fail.
You can think of a table as a single spreadsheet in an application such
as Microsoft Excel. In fact, an Excel spreadsheet is a table, but Excel is not
a relational database management system. A database is really nothing
more than a collection of tables that store information. Sure, there are
many other objects in a database, but without tables you would not have
any data. Using Transact-SQL, also known as T-SQL, you can manipulate
the data in a table. The four basic Data Manipulation Language (DML)
statements are defined as follows:
- SELECT: Allows users to retrieve data in a table or tables
- INSERT: Allows users to add data to a table
- UPDATE: Allows users to change data in a table
- DELETE: Allows users to remove data from a table
How SQL Server Stores Tables
In addition to understanding what tables are, it's important that you understand
how SQL Server stores them; the type of data your columns store
will dictate how the table is stored on disk, and this can directly affect the
performance of your database. Everything in SQL Server is stored on
pages. Pages are 8K contiguous allocations of information on the disk, and
there are different kinds of pages depending on what is on the page. For
our purposes, we will focus on data pages: pages that store table data.
Each row you add to a table is stored on a page, and depending on the size
of the data in the row, the row can be stored either on a page with other
rows, or on its own page or pages.
Before SQL Server 2005, data and overhead for a single row could not
exceed 8,060 bytes (8K). This was a hard limit that you had to account for
when designing tables. In SQL Server 2005, this limit has been overcome,
in a manner of speaking. Now, if your row exceeds 8,060 bytes, SQL Server
moves one or more of your variable-length columns onto a new page and
leaves a 24-byte pointer in its place. This does not mean that you have an
unlimited row size, nor should you make all your rows bigger than 8,060
bytes. Why not? First, notice that we said SQL Server will move variable-length
columns. This means that you are still limited to 8,060 bytes of
fixed-length columns. Additionally, you are still limited to 8K on your primary
data page for the row. Remember the 24-byte pointer we mentioned?
In theory you are limited to around 335 pointers on the main page. As
ridiculous as a 336-column varchar(8000) table may sound, we have seen
far stranger.
If SQL Server manages all this behind the scenes, why should you
care? Here's why. Although SQL Server moves the variable-length fields to
new pages after you exceed the 8K limit, the result is akin to a fragmented
hard drive. You now have chunks of data that need to be assembled when
accessed, and this adds processing time. As a data modeler you should always
try to keep your rows smaller than the 8K limit for performance reasons.
There are a few exceptions to this rule, and we look at them more
closely later in this chapter when we discuss data types. Keep in mind that
there is a lot more complexity in the way SQL Server handles storage and
pages than we cover here, but your data model can't affect the other variables
as much as it can affect table size.
Views
Views are simply stored T-SQL that uses SELECT statements to display
data from one or more tables. The tables referenced by views are often referred
to as the view's base tables. Views, as the name implies, allow you
to create various pictures of the underlying information. You can reference
as many or as few columns from each base table as you need to make your
views. This capability allows you to slice up data and display only relevant
information.
You access views in almost the same way that you access tables. All the
basic DML statements work against views in the same way they do on tables,
with a few exceptions. If you have a view that references more than one base
table, you can use only INSERT, UPDATE, or DELETE statements that reference columns from one base table. For example, let's assume that we
have a view that returns customer data from two tables. One table stores
the customer's information, and the other holds the address data for that
customer. The definition of the customer_address view is as follows:
CREATE VIEW customer_address
AS
SELECT customer.first_name,
customer.last_name,
customer.phone,
address.address_line1,
address.city,
address.state,
address.zip
FROM customer
JOIN address
ON address.customer_id = customer.customer_id
WHERE address.type = 'home'
You can perform INSERT, UPDATE, and DELETE operations against
the customer_address view as long as you reference only the customer
table or the address table.
You may be asking yourself, "Why would I use a view instead of just
referencing the tables directly?" There are several reasons to use views in
your database. First, you can use a view to obscure the complexity of the
underlying tables. If you have a single view that displays customer and address
information, developers or end users can access the information they
need from the view instead of needing to go to both tables. This technique
eliminates the need for users to understand the entire database; they can
focus on a single object. You gain an exponential benefit when you start
working with many base tables in a single view.
Using views also allows you to change the tables or the location where
the data is stored without affecting users. In the end, as long as you update
the view definition so that it accommodates the table changes you made,
your users will never need to know that there was a change. You can also
use views to better manage security. If you have users who need to see
some employee data but not sensitive data such as social security numbers
or salary, you can build a view that displays only the information they need.
Finally, consider how using views can save you time when querying
your database. Every time you run T-SQL code, SQL Server must first compile the code. This transforms the human-readable SELECT statement
into a form that the SQL Server engine can understand, and the resulting
code is an execution plan. Execution plans for running views are
stored in SQL Server, and the T-SQL code behind them is compiled. This
process takes time, but with views, the compilation is done only when the
view is created. This saves you processing each time you call the view. The
first time a view is called, SQL Server figures out the best way to retrieve
the data from the base tables, given the table structure and the indexes in
place. This execution plan is cached and reused the next time the view is
called.
In our humble opinion, views are probably the most underused feature
in SQL Server. For some reason, people tend to avoid the use of views or
use them in inefficient ways. In Chapter 11 we look at some of the most
beneficial uses for views.
Data Types
As mentioned earlier, every column in each of your tables must be configured
to store a specific type of data. You do this by associating a data type
with the column. Data types are what you use to specify the type, length,
precision, and scale of data that can be stored in the column. SQL Server
2008 gives you several general categories of data types, with each category
containing specific data types. Many of these data types are similar to the
types we looked at in Chapter 2. In this section, we look at each of the SQL
Server data types and talk about how the SQL Server engine handles and
stores them.
When you build your model, it is important to understand how much
space each data type requires. The difference between a data type that
needs 2 bytes versus one that requires 4 bytes may seem insignificant, but
when you multiply the extra 2 bytes over millions or billions of rows, you
could end up needing tens or hundreds of gigabytes of additional storage.
SQL Server 2008 has functionality (parts of which were introduced in
SQL Server 2005 Service Pack 2) that allows the SQL Server storage engine
to compress data at the row and page levels. However, this functionality
is limited to the Enterprise Edition and is, in general, more of an
administrative concern. Your estimate of data storage requirements, which
is based on the numbers we talk about here, should be limited to the uncompressed
storage requirements. Enabling data compression in a database
is something that a database administrator will work on with the database developer after the database has been built. With that said, let's
look at the data types available in SQL Server 2008.
Numeric Data Types
Our databases need to store many kinds of numbers that we use day to day.
Each of these numbers is unique and requires us to store varying pieces of
data. These differences in numbers and requirements dictate that SQL
Server be able to support 11 numeric data types. Following is a review of
all the numeric data types available in SQL Server. Also, Table 3.1 shows
the specifications on each numeric data type.
Table 3.1 Numeric Data Type Specifications
| Data Type |
Value Range |
Storage |
| bigint |
–9,223,372,036,854,775,808 through 9,223,372,036,854,775,807 |
8 bytes |
| bit |
0 or 1 |
1 byte (minimum) |
| decimal |
Depends on precision and scale |
5–17 bytes |
| float |
–1.79E+308 through –2.23E–308, 0,
|
4 or 8 bytes |
| int |
–2,147,483,648 to 2,147,483,647 |
4 bytes |
| money |
–922,337,203,685,477.5808 to 922,337,203,685,477.5807 |
8 bytes |
| numeric |
Depends on precision and scale |
5–17 bytes |
| real |
–3.40E+38 to –1.18E–38, 0, and 1.18E–38 to 3.40E+38 |
4 bytes |
| smallint |
–32,768 to 32,767 |
2 bytes |
| smallmoney |
–214,748.3648 to 214,748.3647 |
4 bytes |
| tinyint |
0 to 255 |
1 byte |
Int
The int data type is used to store whole integer numbers. Int does not store
any detail to the right of the decimal point, and any number with decimal
data is rounded off to a whole number. Numbers stored in this type must
be in the range of –2,147,483,648 through 2,147,483,647, and each piece
of int data requires 4 bytes to store on disk.
Bigint
Bigint is just what it sounds like: a big integer number. When you need
larger numbers than supported by the int data type, you can use bigint.
Using bigint expands your range from the paltry 2 billion of an int and allows you to store numbers from approximately negative 9 quintillion all the
way to 9 quintillion. (A quintillion is a 1 followed by 18 zeros.) Bigger numbers
require more storage; bigint data requires 8 bytes.
Smallint
On the other side of the int data type, we have smallint. Smallint can hold
numbers from –32,768 through 32,767 and requires only 2 bytes of storage.
Tinyint
Rounding out the int family of data types is the tinyint. Requiring only
1 byte of storage and capable of storing numbers from 0 through 255, tinyint
is perfect for status columns. Note that tinyint is the only int data type that
cannot store negative numbers.
Bit
The bit data type is the SQL Server equivalent of a flag or a Boolean. The
only valid values are 0, 1, or NULL, making the bit data type perfect for
storing on or off, yes or no, or true or false. Bit storage is a bit more complex
(pardon the pun). Storing a 1 or a 0 requires only 1 bit on disk, but the
minimum storage for bit data is 1 byte. For any given table, the bit columns
are lumped together for storage. This means that when you have 1-bit to
8-bit columns they collectively take up 1 byte. When you have 9- to 16-bit
columns, they take up 2 bytes, and so on. SQL Server implicitly converts
the strings TRUE and FALSE to bit data of 1 and 0, respectively.
Decimal and Numeric
In SQL Server 2008, the decimal and numeric data types are exactly the
same. Previous versions of SQL Server do not have a numeric data type; it
was added in SQL Server 2005 so that the terminology would fall in line
with other RDBMS software. Both these data types hold numbers complete
with detail to the right of the decimal. When using decimal or numeric,
you can specify a precision and a scale. Precision sets the total
number of digits that can be stored in the number. Precision can be set to
any value from 1 through 38, allowing decimal numbers to contain 1
through 38 digits. Scale specifies how many of the total digits can be stored
to the right of the decimal point. Scale can be any number from 0 to the
precision you have set. For example, the number 234.67 has a precision of
5 and a scale of 2. The storage requirements for decimal and numeric vary
depending on the precision. Table 3.2 shows the storage requirements
based on precision.
Table 3.2 Decimal and Numeric Storage Requirements
| Precision |
Storage |
| 1 through 9 |
5 bytes |
| 10 through 19 |
9 bytes |
| 20 through 28 |
13 bytes |
| 29 through 38 |
17 bytes |
Money and Smallmoney
Both the money and the smallmoney data types store monetary values to
four decimal places. The only difference in these two types is that money
can store values from about –922 trillion through 922 trillion and requires
8 bytes of storage, whereas smallmoney holds only values of –214,748.3648
through 214,748.3647 and requires only 4 bytes of storage. Functionally,
these types are similar to decimal and numeric, but money and smallmoney
also store a currency symbol such as $ (dollar), ¥ (yen), or £ (pound).
Float and Real
Both float and real fall into the category of approximate numbers. Each
holds values in scientific notation, which inherently causes data loss because
of a lack of precision. If you don't remember your high school chemistry
class, we briefly explain scientific notation. You basically store a small
subset of the value, followed by a designation of how many decimal places
should precede or follow the value. So instead of storing 1,234,467,890 you
can store it as 1.23E+9. This says that the decimal in 1.23 should be moved
9 places to the right to determine the actual number. As you can see, you
lose a lot of detail when you store the number in this way. The original
number (1,234,467,890) becomes 1,230,000,000 when converted to scientific
notation and back.
Now back to the data types. Float and real store numbers in scientific
notation; the only difference is the range of values and storage requirements
for each. See Table 3.1 for the range of values for these types. Real
requires 4 bytes of storage and has a fixed precision of 7. With float data,
you can specify the precision or the total number of digits, from 1 through
53. The storage requirement varies from 4 bytes (when the precision is less
than 25) to 8 bytes (when the precision is 25 through 53).
Check out the authors' website where they co-host a podcast show for IT professionals.