Software RAID on OpenBSD using RAIDFRAME
Greg Oster ( RAIDFRAME Developer ) writes at misc@openbsd(.)org on how to deal with long parity writes that keep the system from booting to multi user for a long time.
====================================================================
RAIDframe, like hardware RAID and rsync, is just another tool.
Understand the pros and cons of each, but be willing to accept the
risks associated with whatever you choose... (if you think hardware
RAID is riskless, then you've never had a 2TB RAID set suddenly
decide that all components were "offline" and mark them as such!)
For the folks who dislike the "long parity checks"... If you're
willing to accept a window during which some of your data *might* be
at risk, change:
raidctl -P all
to something like
sleep 3600 ; raidctl -P all &
------------------------------------------------
correction from Greg it should be:
(sleep 3600 ; raidctl -P all) &
------------------------------------------------
in /etc/rc . This will, of course, delay the start of the parity
computation for an hour or so, giving your system a chance to do the
fscks and get back to multi-user as quickly as possible.
The risk here is as follows (this is for RAID 1.. risks for RAID 5
are slightly higher):
1) even though parity is marked 'dirty', it might actually be in
sync. In this case if you have a component failure, your data is
fine.
2) until the parity check is done, only the 'master' component is
used for reading. But any writes will be done are mirrored to both
components. That means that when the fsck is being done, any
problems found will be fixed on *both* components, and writes will
keep the two in sync even before parity is checked.
3) Where the risk of data loss comes in is if the master dies
before the parity check gets done. In this case, data on the master
that was not re-written or that was out-of-sync with the slave will
be lost. This could result in the loss of pretty much anything.
The important thing here is for you to evaluate your situation and
decide whether this level of risk is acceptable... For me, I use the
equivalent to 'sleep 3600' on my home desktop.. and slightly modified
versions of it on other home servers and other boxen I look after..
But don't blindly listen to me or anyone else -- learn what the risks
are for your situation, determine what level of risk you can accept,
and go from there...
Later...
Greg Oster
================================================
Greg does a minor correction
it should actually be
(sleep 3600 ; raidctl -P all) &
What I do id I comment out
raidctl -P all
in /etc/rc
(sleep 600; raidctl -P all) &
at the end of /etc/rc.local.
So the parity rebuild starts only 10 minutes after the system is up an
running from the master disk.
=============================================
0 comments:
Post a Comment