Skip to content

系统运维

系统运维负责集群架构的设计和维护。系统运维解决的问题是:

  • 考虑数十台物理服务器,如何管理它们?
  • 用户账户、数据存储怎么做?
  • 怎么保证系统服务的灾备和高可用?

系统运维文档比较多,因为涉及软件和硬件中很多基础、重要的环节。

集群架构设计

在动手操作之前,应当投入一定时间进行详细的设计规划。比如参考文献中就提出了以下步骤:

  1. Version Control -- CVS, track who made changes, backout
  2. Gold Server -- only require changes in one place
  3. Host Install Tools -- install hosts without human intervention
  4. Ad Hoc Change Tools -- 'expect', to recover from early or big problems
  5. Directory Servers -- DNS, NIS, LDAP
  6. Authentication Servers -- NIS, Kerberos
  7. Time Synchronization -- NTP
  8. Network File Servers -- NFS, AFS, SMB
  9. File Replication Servers -- SUP
  10. Client File Access -- automount, AMD, autolink
  11. Client OS Update -- rc.config, configure, make, cfengine
  12. Client Configuration Management -- cfengine, SUP, CVSup
  13. Client Application Management -- autosup, autolink
  14. Mail -- SMTP
  15. Printing -- Linux/SMB to serve both NT and UNIX
  16. Monitoring -- syslogd, paging