Administrator
Administrator
Published on 2025-04-16 / 65 Visits
0
0

PVE在华为服务器上检测磁盘阵列状态

1、前言

本文目的主要是针对华为RH2288V3磁盘阵列状态预警监控,由于目前想获取磁盘状态只能去机房查看硬盘状态灯、IBMC管理后台报警信息两种方式,于是在PVE系统上安装了阵列制造商的程序并自己写了个简单脚本检测告警。

本文环境

服务器:华为RH2288V3

阵列卡:SAS3108

系统:PVE8.3.0

2、安装阵列制造商的检测程序

2.1确认阵列卡

# lspci  | grep -i raid
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)

阵列卡为"Logic MegaRAID SAS-3 3108"

2.2替换镜像源

默认的Debian源很多在国内无法加载,所以替换为清华镜像源

编辑镜像源,并注释原来的,粘贴新的地址

nano /etc/apt/sources.list
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free
deb https://mirrors.tuna.tsinghua.edu.cn/debian-security bookworm-security main contrib non-free

2.3下载安装MegaRAID Storage Manager (MSM)

https://www.broadcom.cn/support

下载的zip包里面只有RPM格式的安装包,而PVE是基于debian的,所以还需要使用alien把rpm转化为deb再安装

apt install alien
tar zxvf 17.05.02.01_MSM_linux-x86.tar.gz
cd disk
alien --scripts *.rpm
dpkg --install lib-utils2_1.00-3_all.deb
dpkg --install megaraid-storage-manager_17.05.02-2_all.deb

默认安装到目录/usr/local/MegaRAID Storage Manager/StorCLI/

如果提示路径太长,可以把MegaRAID Storage Manager重命名为MegaRAID

2.4测试安装效果

查看所有阵列信息,这个输出会很长

# /usr/local/MegaRAID/StorCLI/storcli64 -AdpAllInfo -aALL
Adapter #0

==============================================================================
                    Versions
                ================
Product Name    : SAS3108
Serial No       : 
FW Package Build: 24.16.0-0114

                    Mfg. Data
                ================
Mfg. Date       : 00/00/00
Rework Date     : 00/00/00
Revision No     : 
Battery FRU     : N/A

                Image Versions in Flash:
                ================
BIOS Version       : 6.32.02.0_4.17.08.00_0x06150500
FW Version         : 4.660.00-8313
NVDATA Version     : 3.1605.00-0015
Ctrl-R Version     : 5.17-1302
Boot Block Version : 3.07.00.00-0003

                Pending Images in Flash
                ================
None

                PCI Info
                ================
Controller Id   : 0000
Vendor Id       : 1000
Device Id       : 005d
SubVendorId     : 19e5
SubDeviceId     : d207

Host Interface  : PCIE

ChipRevision    : C0

Number of Frontend Port: 0 
Device Interface  : PCIE

Number of Backend Port: 8 
Port  :  Address
0        500e004aaaaaaa1f 
1        0000000000000000 
2        0000000000000000 
3        0000000000000000 
4        0000000000000000 
5        0000000000000000 
6        0000000000000000 
7        0000000000000000 

                HW Configuration
                ================
SAS Address      : 5a0039665c3fd96e
BBU              : Absent
Alarm            : Present
NVRAM            : Present
Serial Debugger  : Present
Memory           : Present
Flash            : Present
Memory Size      : 1024MB
TPM              : Absent
On board Expander: Absent
Upgrade Key      : Absent
Temperature sensor for ROC    : Present
Temperature sensor for controller    : Absent

ROC temperature : 75  degree Celcius

                Settings
                ================
Current Time                     : 2:32:46 4/16, 2025
Predictive Fail Poll Interval    : 300sec
Interrupt Throttle Active Count  : 16
Interrupt Throttle Completion    : 50us
Rebuild Rate                     : 30%
PR Rate                          : 30%
BGI Rate                         : 30%
Check Consistency Rate           : 30%
Reconstruction Rate              : 30%
Cache Flush Interval             : 4s
Max Drives to Spinup at One Time : 4
Delay Among Spinup Groups        : 2s
Physical Drive Coercion Mode     : 1GB
Cluster Mode                     : Disabled
Alarm                            : Enabled
Auto Rebuild                     : Enabled
Battery Warning                  : Disabled
Ecc Bucket Size                  : 15
Ecc Bucket Leak Rate             : 1440 Minutes
Restore HotSpare on Insertion    : Enabled
Expose Enclosure Devices         : Enabled
Maintain PD Fail History         : Enabled
Host Request Reordering          : Enabled
Auto Detect BackPlane Enabled    : SGPIO/i2c SEP
Load Balance Mode                : Auto
Use FDE Only                     : No
Security Key Assigned            : No
Security Key Failed              : No
Security Key Not Backedup        : No
Default LD PowerSave Policy      : Controller Defined
Maximum number of direct attached drives to spin up in 1 min : 120 
Auto Enhanced Import             : Yes
Any Offline VD Cache Preserved   : No
Allow Boot with Preserved Cache  : No
Disable Online Controller Reset  : No
PFK in NVRAM                     : Yes
Use disk activity for locate     : No
POST delay                       : 90 seconds

                Capabilities
                ================
RAID Level Supported             : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span
Supported Drives                 : SAS, SATA
Boot Volume Supported            : NO

Allowed Mixing:

Mix in Enclosure Allowed
Mix of SAS/SATA of HDD type in VD Allowed

                Status
                ================
ECC Bucket Count                 : 0

                Limitations
                ================
Max Arms Per VD          : 32 
Max Spans Per VD         : 8 
Max Arrays               : 128 
Max Number of VDs        : 64 
Max Parallel Commands    : 928 
Max SGE Count            : 60 
Max Data Transfer Size   : 8192 sectors 
Max Strips PerIO         : 128 
Max LD per array         : 16 
Min Strip Size           : 64 KB
Max Strip Size           : 1.0 MB
Max Configurable CacheCade Size: 0 GB
Current Size of CacheCade      : 0 GB
Current Size of FW Cache       : 815 MB

                Device Present
                ================
Virtual Drives    : 6 
  Degraded        : 0 
  Offline         : 0 
Physical Devices  : 14 
  Disks           : 12 
  Critical Disks  : 0 
  Failed Disks    : 0 

                Supported Adapter Operations
                ================
Rebuild Rate                    : Yes
CC Rate                         : Yes
BGI Rate                        : Yes
Reconstruct Rate                : Yes
Patrol Read Rate                : Yes
Alarm Control                   : Yes
Cluster Support                 : No
BBU                             : Yes
Spanning                        : Yes
Dedicated Hot Spare             : Yes
Revertible Hot Spares           : Yes
Foreign Config Import           : Yes
Self Diagnostic                 : Yes
Allow Mixed Redundancy on Array : No
Global Hot Spares               : Yes
Deny SCSI Passthrough           : No
Deny SMP Passthrough            : No
Deny STP Passthrough            : No
Support Security                : No
Snapshot Enabled                : No
Support the OCE without adding drives : Yes
Support PFK                     : Yes
Support PI                      : Yes
Support Boot Time PFK Change    : No
Disable Online PFK Change       : No
Support LDPI Type1                      : No
Support LDPI Type2                      : No
Support LDPI Type3                      : No
PFK TrailTime Remaining         : 0 days 0 hours
Support Shield State            : Yes
Block SSD Write Disk Cache Change: No
Point In Time Progress: Yes

                Supported VD Operations
                ================
Read Policy          : Yes
Write Policy         : Yes
IO Policy            : Yes
Access Policy        : Yes
Disk Cache Policy    : Yes
Reconstruction       : Yes
Deny Locate          : No
Deny CC              : No
Allow Ctrl Encryption: No
Enable LDBBM         : No
Support Breakmirror  : Yes
Power Savings        : No

                Supported PD Operations
                ================
Force Online                            : Yes
Force Offline                           : Yes
Force Rebuild                           : Yes
Deny Force Failed                       : No
Deny Force Good/Bad                     : No
Deny Missing Replace                    : No
Deny Clear                              : No
Deny Locate                             : No
Support Temperature                     : Yes
NCQ                                     : Yes
Disable Copyback                        : No
Enable JBOD                             : No
Enable Copyback on SMART                : Yes
Enable Copyback to SSD on SMART Error   : Yes
Enable SSD Patrol Read                  : No
PR Correct Unconfigured Areas           : Yes
Enable Spin Down of UnConfigured Drives : Yes
Disable Spin Down of hot spares         : No
Spin Down time                          : 30 
T10 Power State                         : No
                Error Counters
                ================
Memory Correctable Errors   : 0 
Memory Uncorrectable Errors : 0 

                High Availability Properties
                ================
Topology Type                 : None
                Cluster Information
                ================
Cluster Permitted     : No
Cluster Active        : No

                Default Settings
                ================
Phy Polarity                     : 0 
Phy PolaritySplit                : 0 
Background Rate                  : 30 
Strip Size                       : 256kB
Flush Time                       : 4 seconds
Write Policy                     : WB
Read Policy                      : RA
Cache When BBU Bad               : Disabled
Cached IO                        : No
SMART Mode                       : Mode 6
Alarm Disable                    : Yes
Coercion Mode                    : 1GB
ZCR Config                       : Unknown
Dirty LED Shows Drive Activity   : No
BIOS Continue on Error           : No
Spin Down Mode                   : None
Allowed Device Type              : SAS/SATA Mix
Allow Mix in Enclosure           : Yes
Allow HDD SAS/SATA Mix in VD     : Yes
Allow SSD SAS/SATA Mix in VD     : No
Allow HDD/SSD Mix in VD          : No
Allow SATA in Cluster            : No
Max Chained Enclosures           : 16 
Disable Ctrl-R                   : No
Enable Web BIOS                  : No
Direct PD Mapping                : No
BIOS Enumerate VDs               : Yes
Restore Hot Spare on Insertion   : Yes
Expose Enclosure Devices         : Yes
Maintain PD Fail History         : Yes
Disable Puncturing               : No
Zero Based Enclosure Enumeration : No
PreBoot CLI Enabled              : No
LED Show Drive Activity          : No
Cluster Disable                  : Yes
SAS Disable                      : No
Auto Detect BackPlane Enable     : SGPIO/i2c SEP
Use FDE Only                     : No
Enable Led Header                : Yes
Delay during POST                : 0 
EnableCrashDump                  : Yes
Disable Online Controller Reset  : No
EnableLDBBM                      : No
Un-Certified Hard Disk Drives    : Allow
Treat Single span R1E as R10     : No
Max LD per array                 : 16
Power Saving option              : Don't Auto spin down Configured Drives
Max power savings option is  not allowed for LDs. Only T10 power conditions are to be used.
Default spin down time in minutes: 30 
Enable JBOD                      : No
TTY Log In Flash                 : No
Auto Enhanced Import             : Yes
BreakMirror RAID Support         : Yes
Disable Join Mirror              : Yes
Enable Shield State              : Yes
Time taken to detect CME         : 60s

Exit Code: 0x00

我们只需要关注"Device Present"部分,如果"Degraded","Offline","Critical Disks","Failed Disks",都为"0"就判断状态磁盘正常,否则就有故障。

"Device Present"后面一共8行,只要有4个0就OK。

                Device Present
                ================
Virtual Drives    : 6 
  Degraded        : 0 
  Offline         : 0 
Physical Devices  : 14 
  Disks           : 12 
  Critical Disks  : 0 
  Failed Disks    : 0 

2.5获取状态信息

用一个简单的组合命令:

/usr/local/MegaRAID/StorCLI/storcli64 -AdpAllInfo -aALL | grep -A 8 'Device Present' | grep 0 | wc -l

2.6放入脚本

#!/bin/bash

PRESENT=$(/usr/local/MegaRAID Storage Manager/StorCLI/storcli64 -AdpAllInfo -aALL | grep -A 8 "Device Present" | grep 0 | wc -l)
if [[ $PRESENT -eq 4 ]]; then
echo 'All are OK' && exit 0
else
echo 'All are OK' && exit 2
fi

2.7测试脚本

# bash /tmp/check_MegaRAID.sh
All are OK

3、结束


Comment