MT Hardware Recommendations
From The Network People, Inc.
Contents |
To NAS/SAN, or not?
In theory, it may be a better design to use a NAS or SAN. Network storage scales better, has build-in redundancy, etc. However, an often overlooked factor is that more hardware = more failures = more maintenance = more cost. Buy extra hardware only when required, such as when you cannot buy a single system capable of handing the task(s), or when five 9's of uptime is required.
The cost of hardware is more than its purchase price. It needs to include the cost of operation and time spent by humans interacting with it throughout its lifetime. The latter factors usually dwarf the purchase price. Think in terms of TCO. It likely does not make sense to be using three machines (SAN plus two machines) where only one plus backups would suffice.
Buy one machine, sized to last 24 months
Example: I replaced two PIII dual 700MHz/36GB/1GB systems with one dual Xeon 3.0/75GB/2GB. RRDutil graphs showed the rate of disk space consumption over the past two years and I determined that 75GB of disk and 2GB of RAM would suffice for 2-3 years. I paid for exactly as much hardware as I needed, knowing that hardware will be cheaper and faster in the future when I purchase more.
For near line recovery, buy a spare large disk
Even if you use RAID, keep a spare disk in the system with a very recent "snapshot" of your system. For example, if you get 150GB of storage from 3 RAID 5 SCSI disks, stick a fourth 150GB disk in there. If you are using SAS, make that disk a 500GB SATA disk instead.
Spin the disk up once a day/week/month and sync your "primary" disk to it. Then spin it back down using camcontrol. Run a script at boot time that automatically spins it back down (the boot process will spin it up). You can rest assured your disk will last almost indefinitely and consume very little power. Do not spin disks up and down excessively! My 75GB disks are rated at 10,000 spin up/down cycles, meaning I could spin it up once a day for about 10 years before hitting its duty rating.
You will use this disk if you ever suffer a catastrophic disk/RAID failure. Which is unlikely because you are running smartmontools, right? So you'll know long enough in advance to replace the ailing disks, right? And you still have that spare to recover to, so worst case, it won't take nearly as long to restore the last 1/7/30 days worth of files as it would to restore the entire disk/array.
Use ECC RAM
Your general rule should be this; if you depend on the machine and can't get your hands on it in less than 10 minutes, use ECC RAM in it. Period.
A Backup Server
If complete hardware redundancy is important enough to you to warrant the cost, make the second machine identical to the first. Then you've got an inventory of spare parts sitting in the rack next to your really important machine. The more identical the secondary, the better. Configure the disk layouts identically, keep copies of the other systems /etc/rc.conf and other files on both systems, etc.
Use SAS disks
The SAS standard allows for the plugging in of SCSI disks and SATA disks interchangeably (if your backplane supports it). On P1, your "production" server, you will use SCSI disks of course. However, on B1, your backup server, you can use SATA disks instead. Because you have Really Big Disks in there, you can still do your rsync copy of the entire disk image from P1 to B1. Plus, you can use rsnapshot to _also_ keep incremental backups of your production system.
If the pooh really hits the fan and P1 implodes, traffic can be redirected to B1. You have a complete copy of P1 on B1 as well as all the incremental backups so you can be right back online while you fix P1, wait for spare parts to arrive, etc.
If you want to get wild and crazy, install PF on both, plug in a crossover cable between their second NIC ports, and set them up for automatic network failover using CARP. But that is probably overkill. I think a stronger argument can be made for having B1 on another network in another data center.

