Friday, December 17, 2021

Storage VM

 A lot of books miss the basics. Knowing them makes building on the concepts much easier.

NAS = Just like a normal fileserver, except that it is single purpose and usually designed for large capacity versus small physical size.

SAN = A machine that serves block-based storage to fileservers. It is inherently a back-end sort of connection: users do not connect to the SAN. They connect to the fileservers, which connect to the SAN.

The purpose is to separate the physical storage (hard drives, RAID) from the fileserver. Started out as a replacement for those external RAID cages that were connected to fileservers via SCSI cables. But you can only cram so many SCSI cards into a server, and they had to be physically close to the fileserver because SCSI cables can only be so long.

So they designed a protocol called iSCSI that allows the controller-to-drive communications to be run over different links than just a SCSI cable. Now your storage can be in another rack, another room, or even another building. Although iSCSI is routable, you aren't going to get great performance across a WAN. But it can have its uses. By "virtualizing the SCSI cable", you can run the protocol over whatever medium works at the moment. FibreChannel, ethernet, whatever 10gb-over-magic comes around the bend.

Further, because those connections are one-to-many or many-to-one (unlike SCSI cables which are one-to-one), you can have one SAN box serving up storage to multiple servers. This allows you to optimize your capacity. Say you have 10 servers, all with three drives. In RAID5, you lose one of those drives to redundancy. So in your 10 servers, you've got 10 drives of "lost" capacity. So, you buy a SAN box and rebuild. You cram those 30 drives into the SAN box and (simplifying) you can decide that maybe you only need 3 drives worth of redundancy. So instead of 20 drives' worth of storage you get 27 drives' worth. And your techs only have to look after one box to replace failed drives instead of 10.

Further, further, a SAN lets you virtualize the storage volumes. You can carve up one 1000gb RAID volume into smaller pieces. You can do it the old fashioned way, and just give each server a range of blocks on the drive, like partitioning a hard drive in a PC. But with modern LVM layers, you can abstract that away. To increase capacity, you don't have to back up, shut down, add drives, rebuild the RAID and then repartition and restore. You just plug in drives, the SAN box can reshape itself online, and then you can tell the LVM to just increase the number of blocks available to the various logical volumes.

In the end, it starts looking just like a transactional or key-value database, where each record is a storage block. The fileservers know that they have X number of blocks of storage, and the SAN figures out how to keep track of them.

It also simplifies backups and helps with redundancy.

Your task is to figure out all the specifics and how to implement all of this, of course.

The key to remember: it is virtual hard drives. Just like how RAID takes many drives and makes one virtual hard drive, SANs take many hard drives, combines them into one or more giant hard drive(s) which can THEN be carved up into smaller virtual drives. Unless you are dealing with fancy filesystems, only one machine can connect to one hard drive at a time. A SAN can't give clients files. It can only give its clients blocks of storage, and it is up to the client to deal with the filesystem.