Saturday, September 11, 2010

package alternatives - dependency spaghetti

GNU/Linux systems and similar have several package management systems, most of which do a good job.

deb and rpm packages work well, provided dependencies are crafted carefully, to avoid
dependency spaghetti (or dependency hell)

I have run Debian production systems for 7 years or so, and I rarely see aptitude struggle to find a good solution.

Right now I am testing out the latest Debian (Squeeze), and it provides an example, that I have worked through on one system, and want to document here.

During testing, I noticed that my file manager Thunar was not automatically picking up changes to the underlying directory.

File alteration monitors (fam or gamin) generally look after this functionality.

Currently the system is using fam and here is a summary from dpkg:


which says that fam rather than gamin is providing the alteration monitor for my system.

Now the logical thing to do I thought would be to switch to gamin and see if that fixed my issue (described in paragraph 4)

Here aptitude tells me that Thunar recommends gamin:



Here is a screenshot of what aptitude thinks of my attempt to switch away from using fam:




Now removing libfam0 seems like the logical thing to do...

...but a quick look at the dependencies for libfam0, caused me a moment of pause:


And the way to reassure yourself is to preview the changes after choosing (!: Apply):



The output from aptitude includes the phrase 'removing anyway as you requested', which has me wondering?



Now a dpkg summary shows my system having gamin installed rather than fam:




The key to understanding how and why this works is to realise that, in Debian Squeeze, libgamin0 has been marked as a drop in replacement for libfam0 as shown here:




Which means that you can ignore any 'removing anyway as you requested' messages, and know that your system, has not been broken by your actions.

My system has no Nautilus file manager but instead uses Thunar:




...and as thunar recommends gamin, it is a shame that gamin, rather than fam, was not on my system automatically.

Now to see if my file manager is picking up changes to the underlying files in realtime :)

Sunday, July 25, 2010

Debian Squeeze - packages 'held back'

During the 'testing' phase of a Debian distribution, it is expected that you will encounter packages being 'held back' a little more frequently.

Running a stable distribution you should mostly only see 'held back' for packages which you have deliberately put on hold.

Right now Debian Squeeze is a 'testing' distribution and this has provided my example for this article.

ntpdate on Debian (which I have installed):

There are several tools for synchronising dates of your server with external clocks.
Ntpdate is one such tool, openntpd and chrony are some alternatives.

ntpdate on my system shows as being 'held back' and I explain why below.


and running dist-upgrade gives similar results:


Now at this point you can check if perhaps you deliberately placed this package on hold by typing:

aptitude search ~ahold

The last portion spaced out is tilde then a then hold
  ( hold is the action flag you are searching for )

Which on my system gave no results ( meaning I have not placed ntpdate on hold )




So what is causing my system to say ntpdate is 'held back'?


Debian package tools are fairly well behaved and will not remove packages without some intervention from yourself.

You have to instruct Debian to remove dhcp3-client and until you do then ntpdate will be 'held back'.

Knowing this server as I do, I am fairly sure that dhcp3-client is something i do not need to keep, however there is no harm in checking if anything else will be affected.


Here there is a lot of information and you can treat as less important the lines beginning with pipe (|) which are merely 'suggests'.

   ( The non piped lines are a mixture of 'depends' and 'recommends' )

Here is a graphical representation (feel free to skip the graphics if you like)


Using the --installed flag to apt-cache rdepends gives you the most important information specific to your system.



What the above tells me is that ntpdate depends on dhcp3-client and ifupdown 'suggests' dhcp3-client. My final check was that I did in fact have ifupdown installed - yes it is installed on my system.

And the nugget of this article is really how to get apt past this 'held back' situation as shown here:


So if you have a 'held back' package, and you are certain that you are okay with what apt is going to remove, then go ahead and execute...

apt-get install packagename

( where packagename is replaced by the name of your particular package )

Note: Had my system really had a need for dhcp3-client ( certainly some desktop systems make use of this package ), then I would not have proceeded, and would have considered the following options:
  1. Find an alternative to ntpdate
  2. Find an alternative to dhcp3-client
  3. Wait a while to see if the situation fixes itself*
*As explained at the outset of this article, I am running Debian Squeeze during it's testing phase and packages and interdependencies do change fairly regularly.

For those you are interested in alternatives to ntpdate, and where ntp/ntpdate fit into the grand scheme of things, here are some extra (optional) graphics








     ...and more generally chrony, openntpd, ntpdate (below) ...



Links and further reading:
  • The package aptitude-doc-en has great html documentation describing things like 'action flags' and their meaning.
  • ntpdate and ntp are different packages. Ntp is for those who want a running daemon rather than just a cli tool for crontab entry happiness.
  • This mailing list entry is several years old but provides some very relevant information.

Thursday, July 8, 2010

Standard exit codes - shell scripts versus binaries

This short article is prompted by the question "What return codes should I use in my shell script?"

Some Answers:
  • A non-zero value
  • A non-zero value in the ranges 1 through 127 and 138 through 255
  • A non-zero value in the ranges 1 through 63, 79 through 127, and 138 through 255
The first answer is certainly correct.

The remaining answers are really a matter of personal preference.


Exit codes 128 through 137:

If a process is terminated by a signal then the standard behaviour is to take the numeric value of the signal, add 128, and use that value as exit code.

128+SIGNAL

So kill -9 someprocess should in theory see the process exit with code 137

There are more than 9 signal codes so you could, if you wish, avoid some of the codes 138 onwards to be absolutely sure.

( Avoiding 128 through 159 might be your preference )

The signal codes for Linux are described here.


Exit codes 64 through 78:

From the Linux exit manpage at kernel.org:
BSD has attempted to standardize exit codes; see the file <sysexits.h>
The actual meanings of the codes are given in this OpenBSD page, but I reproduce the first (64) and last (78) directly give you a flavour:

  • EX_USAGE (64) The command was used incorrectly, e.g., with the wrong number of arguments, a bad flag, a bad syntax in a parameter, or whatever.
  • EX_CONFIG (78) Something was found in an unconfigured or misconfigured state.

Shell script return codes - my personal suggestion:

Have a quick look through the OpenBSD range 64 to 78 and find something suitable. Then add 100 to that code.

First Example (code 175):

75 in OpenBSD says:
Temporary failure, indicating something that is not really an error. In sendmail, this means that a mailer (e.g.) could not create a connection, and the request should be reattempted later.

( Now adding 100 give 175 which I use )


Second Example (code 178):

78 in OpenBSD says:
Something was found in an unconfigured or misconfigured state.
( Now adding 100 gives 178 which I use )


The bash scripting guide notes (tldp.org):

Appendix D gives some guidance about exit codes.


Linux documentation of sysexits.h (permission of BSD):

#define EX__BASE 64 /* base value for error messages */
#define EX_USAGE 64 /* command line usage error */
#define EX_DATAERR 65 /* data format error */
#define EX_NOINPUT 66 /* cannot open input */
#define EX_NOUSER 67 /* addressee unknown */
#define EX_NOHOST 68 /* host name unknown */
#define EX_UNAVAILABLE 69 /* service unavailable */
#define EX_SOFTWARE 70 /* internal software error */
#define EX_OSERR 71 /* system error (e.g., can't fork) */
#define EX_OSFILE 72 /* critical OS file missing */
#define EX_CANTCREAT 73 /* can't create (user) output file */
#define EX_IOERR 74 /* input/output error */
#define EX_TEMPFAIL 75 /* temp failure; user is invited to retry */
#define EX_PROTOCOL 76 /* remote error in protocol */
#define EX_NOPERM 77 /* permission denied */
#define EX_CONFIG 78 /* configuration error */
#define EX__MAX 78 /* maximum listed value */ 
 
 



If you have the Linux source installed then the file /usr/include/sysexits.h contains the text pasted above.

Tuesday, June 29, 2010

Upgrading a VPS to a 2010 version of Linux - signalfd() test

The successful running of VPS servers for, Xen and OpenVZ, relies on compatibility between the underlying host Kernel and the requirements for Linux as a Guest.

Modern Linux versions (2010) may have difficulty running atop of some of the aging host containers employed by VPS companies.

In particular many VPS host containers provider Kernel facilities first implemented in 2007, with more modern features missing.

Testing if your Kernel supports signalfd():

If you are fairly certain that your VPS supports modern Linux (provides Kernel 2.6.26 to your container), then the Linux Test Project (described at end of article) will be enough to confirm things.

It is perhaps more likely that you do not know if you have signalfd() support, and want to do a test function call.

The signalfd() manpage provides a good summary and some test code:
signalfd() is available on Linux since kernel 2.6.22. Working support is
provided in glibc since version 2.8. The signalfd4() system call (see NOTES) is available on Linux since kernel 2.6.27.

Extract from the test code:


for (;;) {
s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
if (s != sizeof(struct signalfd_siginfo))
handle_error("read");

if (fdsi.ssi_signo == SIGINT) {
printf("Got SIGINT\n");
} else if (fdsi.ssi_signo == SIGQUIT) {
printf("Got SIGQUIT\n");
exit(EXIT_SUCCESS);
} else {
printf("Read unexpected signal\n");
}
}

The full code is available in the manpage on kernel.org, and for convenience there is also a copy in this directory.

In the above loop, pay particular attention to the item fdsi.ssi_signo, as some outdated manpages may have an old reference.

( I describe this in detail in README.txt )

Examples of running signalfd_demo32bit on VPS:


The example above shows a failure message, as this VPS does not have access to
a host kernel which implements signalfd()


Another failure giving the same message 'function not implemented'

Now here I show a working example on a local Debian Squeeze install:


If signalfd() is supported by the running Kernel then you running signalfd_demo should make your system wait for input.

Pressing Ctrl+C should say 'Got SIGINT'
and Pressing Ctrl+\ should say 'Got SIGQUIT'

Warning: If you have redefined Ctrl+\ to be intercepted by screen or some graphical tool, then you are going to have difficulty getting out of the test!

Summary of expected responses:
  • Your Kernel is 2.6.18 and/or does not support signalfd()
    'Function not implemented'

  • Your Kernel supports signalfd()
    Your system should enter a 'wait' state until
    you press Ctrl+C or Ctrl+\ after which
    it should respond with 'Got SIGINT' or 'Got SIGQUIT' as appropriate
Examples of running signalfd_demo64bit on 64 bit VPS:


'Function not implemented' and again below another 'signalfd: Function not implemented':


and now a success for OpenVZ running a patched Kernel 2.6.18:


Ignore the echo statement which is just a way of me highlighting this surprising result.


Patching 2.6.18 to give signalfd() support - pros and cons:

Arguing the thing both ways...

Pros:
  • Hundreds of thousands of VPS containers may be able to successfully deploy Debian Squeeze and the latest Ubuntu (if signalfd() support is patched in)

Cons:
  • Confusion. Wholescale backporting of features to a Kernel tree that is over 3 years old seems like a mistake to me. ( Upgrading the underlying container software to RedHat 6 or CentOS 6 really feels to me to be a better solution )

  • Lack of easily scriptable tests. Already there are scripts out there that check for Kernel 2.6.26 or newer (see ltp below for example). If patching 2.6.18 becomes widespread, then scripts that test the Kernel version number to determine signalfd() support will become less useful.

Linux Test Project and signalfd():

When I was searching around for a way of testing for signalfd() support, I read about the Linux Test Project, and installed some software:

apt-get --no-install-recommends install ltp-kernel-test

You can leave out the --no-install-recommends but be warned that you will pull down a lot of packages if you decide to go that way.

Here are ltp tests running on a local 32 bit desktop install of Debian Squeeze:



and this result is less successful:



signalfd4() versus signalfd():

Newer Kernels of the ( 2.6.2x and 2.6.3x series ) implement signalfd4().

I quote directly from the manpage to explain their relationship:

Starting with glibc 2.9, the signalfd() wrapper function will use signalfd4() where it is available.

If you feel that there is merit in adapting the signalfd_demo.c code, to use signalfd4() instead, then there is perhaps some work to do there - feel free to take this on as an exercise in C.

Further reading and links:

Tuesday, June 15, 2010

Upgrading VPS to Debian Squeeze

In any graphical VPS control panel, you are unlikely to see Debian Squeeze appear in the drop down for 'Reinstall OS' just yet*.

( *Debian Squeeze is perhaps to be released late 2010 )

But in order to test out the forthcoming release, perhaps you might go ahead and manually upgrade.


Kernel 2.6.3x rather than 2.6.18:

Ideally your VPS will be running in a hosting container that is running kernel 2.6.26 or newer.

( Such newer kernels are provided in Debian Squeeze, Red Hat Enterprise 6, and Ubuntu Lucid. It really depends on your hosting company, when they installed the container, how often they update container kernels, and their technology. )

If your container is unsuitable then you might see something like this:


Here Debian Squeeze is helping you, by pointing out that your kernel, does not support everything which Debian Squeeze might want.

Seeing "udev requires a kernel >=" warning on your VPS should make you pause.

One of the things that might be an issue is signalfd() or lack of.

If you decide to ignore the advice and try and run Debian Squeeze anyway, then you will probably be looking at how to get permissions fixed on some /dev/ files.

( It would be unwise for me to recommend this kind of action by elaborating.
It certainly makes reporting bugs difficult, if you deliberately run a production system using a kernel that is known to be outdated. )



/var/log/auth.log - is logging active?:

From a security point of view auth.log is an important file.
On a properly configured system, auth.log will give a concise record of important authorisations (including ssh logins)

After your upgrade it does no harm to check /var/log/auth.log is being written to.


auth.log is sometimes used by logwatch and similar security tools.

If you upgrade (particularly to a beta or testing distribution) then check that your system logging is working as expected.

Also check, when changing your system from running rsyslog to dsyslog or syslog-ng or other system logging alternatives.

Here is a diff to overcome an issue with the testing version of dsyslog in squeeze:


( The line marked with the green cursor is the important addition, and the sections related to cron are my convenience changes, rather than to fix any issue )

Today dsyslog.conf needed to be altered as condition pattern did not seem to be working as intended:

condition pattern { facility "auth*"; }; 

The above should have dsyslog writing to auth.log , but it was not happening on my system. It will likely be fixed before squeeze is officially released later this year.

To overcome the issue, I instead use condition literal as illustrated in the screenshot [ above ].


Packages you may want to purge once upgrade to squeeze is complete:
  • gcc-4.2-base
  • gcc-4.3-base

Locales cleardown and removing cruft from server accounts running X:

Try localepurge for reclaiming some redundant server space.

Another tool, bleachbit, might be useful for those running X on a remote server.

Bleachbit helps clear down user files such as those left behind when running web browsers, OpenOffice, etc.

( I have not used it in a server environment and would have to do some further reading, before I would feel comfortable running a shred like tool on a production server )


I tag this article 'nonKVM' and 'oldkernel' as Xen and OpenVZ, being more widespread in 2008 and 2009, tend to run with older kernel 2.6.18 rather than the much newer 2.6.3x series such as that included in RHEL6. Dedicated KVM resource offerings will become more widespread in late 2010 and 2011, and benefit from kernel 2.6.3x by simple fact of coming later to the party.