系统运维¶
系统运维负责集群架构的设计和维护。系统运维解决的问题是:
- 考虑数十台物理服务器,如何管理它们?
- 用户账户、数据存储怎么做?
- 怎么保证系统服务的灾备和高可用?
系统运维文档比较多,因为涉及软件和硬件中很多基础、重要的环节。
集群架构设计¶
在动手操作之前,应当投入一定时间进行详细的设计规划。比如参考文献中就提出了以下步骤:
- Version Control -- CVS, track who made changes, backout
- Gold Server -- only require changes in one place
- Host Install Tools -- install hosts without human intervention
- Ad Hoc Change Tools -- 'expect', to recover from early or big problems
- Directory Servers -- DNS, NIS, LDAP
- Authentication Servers -- NIS, Kerberos
- Time Synchronization -- NTP
- Network File Servers -- NFS, AFS, SMB
- File Replication Servers -- SUP
- Client File Access -- automount, AMD, autolink
- Client OS Update -- rc.config, configure, make, cfengine
- Client Configuration Management -- cfengine, SUP, CVSup
- Client Application Management -- autosup, autolink
- Mail -- SMTP
- Printing -- Linux/SMB to serve both NT and UNIX
- Monitoring -- syslogd, paging