| Home Up |
Q. I can't upgrade any packages, run up2date or rpm.
What is wrong? Let me state up front, and at the top. Just like the rpm -ba to rpmbuild -ba conversion, the fight is over: Upgrade!! All known and reproducable issues needing the discussion matter below are solved in the laterst production releases of RPM. There is a man page for rpm2cpio; there are all sorts of problems thatrpm plain and simply cannot do anything about. You want to run buggy software? Not rpm's problem. A. The RPM database sometimes gets stuck. One cause of this problem is when a process working on the RPM database is killed, which leaves (thereafter inaccurate) lock state information behind, limiting what further processes can do to the database. So, a word to the wise: make sure such a process really is stuck before killing it - if a process is using much CPU time (eg. as reported by top) it is probably still doing useful work. (A tip o' the hat to: M A Young <m.a.young durham.ac.uk> for parts of this summarization. revised 17 Jan 2004, after a helpful mailing list thread, pointing out a prior loose use of nomenclature by RPH. All remaining errors are mine alone.) This lock state information has to be repaired manually for rpm to eliminate what are described at 'hangs', and to permit rpm to properly function. With the recent capabilities for several programs to concurently be using, and writing, into the RPM database, RPM alone cannot no longer safely 'assume' it is safe to clear out arbitrary lock state information which it or its parent process dod not create. This therefor relies on the external judgment of the system administrator to make the determination to reset and clear lock state. What the nature of the underlying problem is, and how to fix it, are detailed below: First informally: When a process _is_ so killed [sometimes due to a loss of power crash, sometimes due to a third party helper application dying without cleaning up lock state information in files at exit, sometimes due to an impatient admin], it may also leave some small corruption behind in the RPM 'SleepyCat' db based database. Usually the simple It would be safest to clear these locks while still in single user mode; as a system is booting up, it is in single user mode before init can fire off any 'child' daemon processes. Checking, we can find: But sometimes there are still problems. If the simply removal of the '__*' files does not clear up the problem, it is also possible that other damage exists, such corrupted linked lists, and is separately present. Proceed carefully -- take a backup FIRST, before trying to have rpm repair the database. It is also _usually_ safe to rebuild these lists, thus: The way a rpm --rebuilddb command works, it creates a temporary directory to work in at: /var/lib/, next to the 'parent' one at: /var/lib/rpm/ and attempts a rebuild. As it gets to the very bottom of a normal rebuild, it replaces the new content over top of the old files. Obviously, if some other process has independently filled the partition containing /var/lib/, and is hung but ready to run, waiting for an inode to come free, a 'race' exists which the rebuild process may lose. At that point, if the 'other' process 'wins' and grabs the 'last' newly freed available resource, the rebuild process cannot complete succesfully and RPM database content may be lost. Ouch. The lesson of this is to always take backups of /var/lib/rpm/ before doing a --rebuilddb [actually, to take periodic backups generally is of course always the better practice] and to take care not to rebuild where ENOSPC can occur. A quick and dirty way to take a temporary on-disk backup is: During and after some rebuilds (as on Red Hat Linux 9), an unsightly but harmless warning message is produced, which may safely be ignored. The message looks like this: [root@dhcp108 rpm]# rpm --rebuilddb error: db4 error(16) from dbenv->remove: Device or resource busy [root@dhcp108 rpm]#This is scary (because the message uses the word 'error', looks like a 'hung' NFS mount error message, and did not describe itself just a 'warning'). It breaks the *nix expectation that well-behaved processes run silently when they are free of errors. Ahhh, well ... But sometimes, unfortunately, even more substantive database repair is needed, as described below the section with Jeff Johnson's email later in this discussion. Think of a two way linked list (we ignore indices to simplify the presentation) for RPM Package description items thus: What else might cause the problem? Warren Togami has reviewed the trouble ticket trackers for the Conectiva apt for rpm package, and others (the Ximian updater, Gerald Teschl's autoupdate), and helped locate some interaction issues which are being worked on and discussed on the RH rpm-list. There is also a reported, but not reproduceable issue involving corrupted signing Keys; if you have such a report, and can write a test case to reliably, or even frequently, cause this corruption, please file a Bugzilla and attach the test case script. More formally: see the Bugzilla master ticket on this issue, 73097 The text marked with the Date: Thu, 8 Aug 2002 09:10:12 -0400 From: Jeff Johnson <jbj@redhat.com> Subject: Re: rpm database - how to repair it? On Thu, Aug 08, 2002 at 01:37:52PM +0200, Robert Vojta wrote: <snip> > What is the safest way how to repair rpm database > (if it's corrupted)? I was doing this job a long time > ago and I forgot the whole process :( Hmmm, "hangs" in select are usually stale locks. Fix by doing All that needs repairing is /var/lib/rpm/Packages, the indices can/will be rebuilt with rpm --rebuilddb later. Save a copy just in case: If there are any errors, repair by doing Read all the headers in Packages by doing Rebuild the indices HTH 73 de Jeff -- Jeff Johnson ARS N3NPQ jbj@redhat.com (jbj@jbj.org) Chapel Hill, NC Later: On Thu, Aug 08, 2002 at 08:08:13AM -0400, Gene C. wrote: > > 2. There are circumstances that screw up the rpm "database" which > are not recoverable ... you will need to re-install. No, there aren't any cases the database is not recoverable. There are still times that a re-install is an easier recovery pathway, however. Editor's Note added September 18 2002: In moving to RPM-4.1, the above repair and test process needs an update of the db3-3.x packages packages from the RPM ftp site, if an error of the following sort shown in bold red is encountered: These earlier 'transition' packages are available here. Last updated: Sat, 17 Jan 2004 16:55:31 -0500 |