Technical notes, my online memory: August 2010

Wednesday, August 25, 2010

Where there is awk, there is sed

I couldn't do a post on awk without also quickly covering sed. Bruce Barnett has written a great tutorial for sed that is worth reading.

Sed is your friend for applying regex's to files (well, streams really, since it is the 'stream editor'). The regex syntax is the same as for vim, and since that is my primary editor I only tend to use sed where the files are large and will take ages to load into vim.

Some quick examples to illustrate what I'm talking about:

sed s'/user root/user nothingtoseehere/g' < /var/log/auth.log

sed s'!session closed for user \([^ ]*\)!\1 closed a session!g' < \
/var/log/auth.log

sed s'!session closed for user \([^ ]*\)!&, allegedly!g' < \
/var/log/auth.log

Escaped parenthesis

\(\)

capture a value, and

refers to the whole match. You can also use sed like grep. By default it prints every line, but that can be disabled with "-n" and you can cause matching lines to be printed by appending a "p" option:

sed -n 's/pattern/&/p' < file.txt

OR, if we aren't making a substitution

sed -n '/pattern/p' < file.txt

OR, you can just use grep -e, which is less confusing:

grep -e "pattern" file.txt

Rewriting a file in-place, making a backup with the .bak extension:

sudo sed -i.bak s'!XKBOPTIONS=""!XKBOPTIONS="ctrl:nocaps"!' /etc/default/keyboard

AWK - selecting columns for output

AWK is a super-handy old-skool UNIX tool. There are plenty of good tutorials out there for it, but I'm jotting down some basic uses here for my own benefit.

I have used AWK mainly to select and print certain columns from input, like this, which will print the 1st and 7th columns:

awk '{print $1,$7}' < /var/log/syslog

Columns break-up is determined by the value of the FS (input field separator) variable, which is space by default (in POSIX mode this actually means space and tab but not newline). You can change this with:

awk 'BEGIN {FS=";"}{print $1,$7}' < /var/log/syslog

OR

awk -F: '{print $1,$7}' < /var/log/syslog

The output from awk is separated by the OFS (output field separator) variable, also a space by default. To write out CSV you might use:

cat /var/log/syslog | awk -F: 'BEGIN{OFS=","}{print $1,$3}'

There is plenty more you can do with awk, including simple programming tasks such as counting, summing etc. cut is a simple alternative if all you want to do is cut fields from an input stream. It is doesn't take much to hit its limitations however. Consider the output of last, the first two columns of which look like this:

user     pts/0
user     pts/1
reboot   system boot

This awk command will print the first two columns correctly:

last | awk '{print $1,$2}'

Whereas this cut command:

last | cut -d" " -f1-5

Won't produce the first two columns cleanly and we need to specify 5 columns to try and skip the empty fields. The problem is there are variable numbers of spaces between the username and the tty line.

Monday, August 23, 2010

Tips for hardening apache on ubuntu for django deployment

There is good doco for deploying django on apache with mod_python or wsgi. Here are a couple of extra tips for Ubuntu. First, edit

/etc/apache2/conf.d/security

and enable:

ServerTokens Prod
ServerSignature Off
TraceEnable Off

And in the apache config in your "Location /" directive with the other django stuff:

Options -Indexes -Includes -Multiviews SymLinksIfOwnerMatch

Take a look at Apache's security tips and it is also worth understanding how the Apache configuration directives (Directory, Location, etc.) work.

Friday, August 20, 2010

SSH client config

For Internet-connected hosts, running SSH on a different port is a really good idea since it cuts down the noise of authentication attempts from bots looking for weak passwords. Running on a different port is not a substitute for a secure configuration (ie. no root login, key-only auth) - it is purely useful in cutting down log noise.

Unfortunately you have to remember which port you chose :) To minimise the hassle you should add entries in your client /etc/ssh/ssh_config:

Host nickname
    Port 43210
    HostName mysshserver
    User myuser

Now you can use "ssh nickname" and ssh will translate that to:

ssh -p 43210 mysshserver

Monday, August 16, 2010

Installing Windows 7 onto a netbook using USB

I wanted to install Windows 7 onto a netbook to replace an aging desktop as my only windows-on-metal box. This is a breeze with modern linux distros, but is of course far harder than it needs to be for windows.

I had a crack at using unetbootin on Linux with the Windows 7 ISO, despite a suspicious lack of mention of support for windows, and sure enough it didn't boot (unetbootin didn't give me any errors, it just didn't boot).

More googling turned up the Windows 7 USB/DVD Download Tool, which converts a Windows install ISO into a bootable USB installer - exactly what I wanted. After half an hour of downloading and installing dependencies (due to the lack of windows package management and a bizarre need to do the Genuine Windows check), I had the tool installed and it happily created a bootable USB for me.

This one worked perfectly, and dropped Windows 7 starter onto the netbook.

Sunday, August 15, 2010

HOWTO call a python superclass method

Use the python 'super' method to call a superclass __init__:

class Foo(Bar):
    def __init__(self):
        super(Foo, self).__init__()

Saturday, August 14, 2010

Working with Amazon S3 storage

I was using duplicity on Amazon S3 storage for backup, but gave it up because it was waaaay too slow (I believe the slowness was mainly duplicity, rather than network traffic or S3). So, time to delete the data from S3. I logged onto the Amazon S3 web interface, but found it pretty useless: I had hundreds of files to delete and there was no way to 'select all', or even delete a whole bucket at once. In fact, I couldn't even get the web interface to delete a single file for me. Seems like it is in Amazon's interest to make deleting data hard...

So I installed the 's3cmd' package on Ubuntu, which worked a treat. Setup with:

s3cmd --configure

Then to delete all the data in a bucket:

s3cmd del s3://data.bucket.name/*
s3cmd rb s3://data.bucket.name

Thursday, August 12, 2010

Python named tuples

Python named tuples are a good way to make your code more readable when using tuples. Instead of using numerical dereferences like:

In [49]: c=('abc','adefa','aaaa')

In [50]: c[0]
Out[50]: 'abc'

You can create a namedtuple class:

In [51]: from collections import namedtuple 

In [53]: MyTup = namedtuple('MyTup','first second other')

In [54]: t = MyTup("aa","bb",other="cc")

In [55]: t.first 
Out[55]: 'aa'

Postfix internal network information in 'Received' header

With the default Postfix configuration, a "Received" header line is added for every hop, which is fine, but I was surprised to learn a line is also added for mail sent to the local Postfix instance, i.e. 127.0.0.1. It looks something like this:

from mybox.internal.lan (localhost [127.0.0.1])

Assuming this is your last hop before the Internet you are best off just adding your public dns name as the first entry in /etc/hosts (it also gets appended to the Message-ID header value).

However, if you have more internal mail hops you don't want the world knowing about, you will need to create a header_checks rule that removes them (bear in mind this will make diagnosing problems harder...). Put a line like this in /etc/postfix/main.cf:

header_checks = regexp:/etc/postfix/header_checks

And put your regexes in /etc/postfix/header_checks:

/^Received:\sfrom\smybox.internal.lan/ IGNORE

Wednesday, August 11, 2010

Adding a defeat for a DNAT rule to allow SSH packets to hit the local box

I've been using SSH to pump packets down a VPN like this:

iptables -A PREROUTING -t nat -d $external_ip -j DNAT --to-destination $tun
iptables -A POSTROUTING -t nat -s $tun -o eth0 -j SNAT --to-source $external_ip

The problem is I need SSH packets to hit the local interface (i.e. not go down the VPN). Solution: add a REDIRECT rule before the DNAT in the PREROUTING chain:

iptables -A PREROUTING -t nat -d $external_ip -p tcp --dport 22 -j REDIRECT

The REDIRECT target sends to localhost (really the same as DNAT with --to-destination 127.0.0.1).

HOWTO change the preferred application for PDF on Ubuntu

I recently installed adobe reader, and it stole "preferred application" status for PDFs away from evince.

To check the default for PDF use:

xdg-mime query default application/pdf

Which, in my case was "AdobeReader.desktop". To change it:

xdg-mime default evince.desktop application/pdf

HOWTO Setup OpenVPN on Ubuntu

The Ubuntu community doco has a decent HOWTO that I won't reproduce, and the O'Reilly article has a good summary of the openssl commands you need to generate the certs (or you could read my openssl posts). Just a few extra notes.

If you want to tie a client to a particular VPN ip address, create a file in:

/etc/openvpn/ccd/clientname

where "clientname" is the Common Name from the certificate your client uses.

In this file put:

ifconfig-push 192.168.1.8 192.168.1.5

This will tie the "clientname" box to 192.168.1.8. There appears to be a lot of confusion on the web and in forums as to what should be in the second parameter. The doco states it is the remote-netmask. In this case "192.168.1.5" is the local end of the point-to-point link, which works. If the doco is right "255.255.255.0" might be more correct. As an aside, the address allocation is in successive /30 subnets (so last octet is 1,2,5,6,9) to be compatible with Windows.

If you also want all traffic from the client to exit via the VPN (ie. have the VPN as the default route) add this special sauce after the ifconfig-push line:

push "redirect-gateway def1 bypass-dhcp"

This tells openvpn that you want to use the VPN as the default gateway but still use local DHCP.

Monday, August 9, 2010

HOWTO create an LVM-based virtual machine with KVM/libvirt

A quick google didn't turn up any well-researched benchmarks for performance of VM image files vs. KVM-based VMs (see here for an attempt), but it makes sense to me to eliminate the file loopback overhead by using KVM-based VMs.

There are a few good HOWTOs out there, which I have boiled down into basics:

First, build your VM using JeOS

sudo vmbuilder kvm ubuntu --dest=/data/kvm/temp-ubuntu-amd64 --bridge=br0 --mem=2048 -c mybox.vmbuild.cfg

I tried the vmbuilder '--raw' option and pointed it at my LVM, but vmbuilder seemed to silently ignore it. So we will have to convert the image file instead.

The "raw" output option for qemu-img should do the trick, but I believe I hit a known bug, because I got:

qemu-img: Error while formatting '/dev/vg01/mybox'

Using "host_device" worked (you could also just convert to raw then dd):

qemu-img convert disk0.qcow2 -O host_device /dev/vg01/mybox

You then need to update your KVM definition file to point the hard disk at the logical volume.

Thursday, August 5, 2010

Linux runlevel configuration - start services on boot for Red Hat and Ubuntu

To configure runlevels on Ubuntu use the 'update-rc.d' tool. For example to ensure the '/etc/init.d/blah' script gets started and stopped in the normal runlevels, use:

sudo update-rc.d blah defaults

The equivalent tool on Red Hat is 'chkconfig', use:

sudo chkconfig blah on