Monday, October 6, 2014

Python: add an element to an array only if a filter function returns True

This post was going to be about a fairly obscure feature of Python I found, but is now a minor cautionary tale about trying to be too clever :)

I was looking for an elegant solution to the problem of appending elements to a list only if a filter function returns true. The long (and in retrospect much more readable) way to write this is something like:
filter_result = self.FilterFunction(response)
if filter_result:
There is in fact a one-liner that can do this for you, but since its fairly obscure it makes the code much harder to understand.
processed_responses += filter(None, [self.FilterFunction(response)])
This works because when the first argument to filter is None, the effect is to remove all items from the sequence that evaluate to False. In this case that means if self.FilterFunction is False you get an empty array, and appending the empty array has no effect on processed_responses. If it's True, you append a single element.

Obvious huh?

Mocking out python OS specific imports

Testing python code that needs to run on a different OS is painful. A major part of the difficulty is that even though you can (somewhat) easily mock out the API calls used, you can't import the code successfully because the modules only exist on the target OS. Lets take an example of code that imports and calls functions from the win32api module. How do you mock out the import so you can test it on linux? I know of two main approaches.

One is the proxy module. Basically you define a module to hide all of the OS-specific imports behind, and do a conditional import in that module. So instead of having code like this:
import win32api
you do
import windows_imports
and then in windows_imports/
import platform

if platform.system() == "Windows":
  import win32api
  import winerror
  import wmi
Then inside your tests you need to create stubs to replace your API calls, e.g. for windows_imports.win32api.GetLogicalDriveStrings. Theoretically this should be fairly straightforward, but when I started down this path it got fairly complicated and I struggled to make it work. In the end I gave up and settled on the second approach, as below.

The second approach, described here, is to delay the import of the OS specific code in your tests until after you modify sys.modules to stub out all the OS-specific modules. This has the distinct advantage of leaving your production code untouched, and having all the complexity in your test code. Using the mock library makes this much easier. Below is an example of mocking out a WMI call made from python.
import mock
import test_fixture
import unittest

class WindowsTests(unittest.TestCase):

  def setUp(self):
    self.wmimock = mock.MagicMock()
    self.win32com = mock.MagicMock()
    self.win32com.client = mock.MagicMock()
    modules = {
        "_winreg": mock.MagicMock(),
        "pythoncom": mock.MagicMock(),
        "pywintypes": mock.MagicMock(),
        "win32api": mock.MagicMock(),
        "win32com": self.win32com,
        "win32com.client": self.win32com.client,
        "win32file": mock.MagicMock(),
        "win32service": mock.MagicMock(),
        "win32serviceutil": mock.MagicMock(),
        "winerror": mock.MagicMock(),
        "wmi": self.wmimock

    self.module_patcher = mock.patch.dict("sys.modules", modules)

    # Now we're ready to do the import
    from myrepo.actions import windows = windows

  def tearDown(self):

  def testEnumerateInterfaces(self):

    # Stub out wmi.WMI().Win32_NetworkAdapterConfiguration(IPEnabled=1)
    wmi_object = self.wmimock.WMI.return_value
    wmi_object.Win32_NetworkAdapterConfiguration.return_value = [

    enumif =
    interface_dict_list = list(enumif.RunWMIQuery())

HOWTO change the extension of lots of files while keeping the original filename

With rename you get all the power of perl regex substitution in a simple interface. e.g. to backup all the .yaml files in a directory you could use:
rename --no-act 's/(.*)\.yaml$/$1.yaml.bak/' *.yaml
Remove the --no-act to actually make the changes.

The next logical progression is to want to do this more than once, so adding a timestamp to the backup is desirable, but I couldn't think of a way to make rename do this (you can't put backticks in the regex for instance). So here's a workaround
find . -name *.yaml -exec mv {} {}.bak.`date +%Y-%m-%dT%H:%M:%S%z` \;

Friday, August 22, 2014

Powershell and WMI cheatsheet

I finally bit the bullet and spent a bit of time learning some powershell and WMI querying basics. Unfortunately lots of the tutorials and intros out there are missing key pieces that make it usable, like aliases. So here's my cheatsheet as an attempt to remedy that problem.

Listing Classes and Namespaces

First of all, figuring out what information is available is not that easy. It's in classes inside nested namespaces. There's good online documentation, but I'd prefer to get it from the tool itself where I can see the values on a live system. So, lets start with listing all the top-level WMI Namespaces.
PS C:\> Get-WmiObject -class __Namespace -namespace root | Format-Table name


Note that Namespaces can be nested, the following ones sit below root\cimv2 which is the default namespace for Get-WmiObject if you don't specify one. If you really want to see everything you're going to need to list them recursively.
PS C:\> Get-WmiObject -class __Namespace -namespace root\cimv2 | Format-Table name

To list all classes within a namespace (see here for a high-level description of the naming scheme for classes):
PS C:\> Get-WmiObject -List -namespace root\cimv2 | Select name
All classes with 'Net' in the name:
PS C:\> Get-WmiObject -List | Where-Object { $ -match 'net'} | Select Name

Retrieving Object Values

Retrieve the class values:
PS C:\> Get-WmiObject -class Win32_NetworkAdapterConfiguration

DHCPEnabled      : True
IPAddress        : {, abcd::eff:fff:aaaa:abdc}
DefaultIPGateway : {, abce::eff:feff:aaaa:265}
DNSDomain        :
ServiceName      : 
Description      : Intel(R) Gigabit Network Connection
Index            : 7

Aliases and Functions

Sick of typing 'Get-WmiObject' yet? Thankfully 'gwmi' is an alias for Get-WmiObject:
PS C:\> gwmi -class Win32_NetworkAdapterConfiguration
The list of built-in aliases can be retrieved with:
PS C:\> get-alias

CommandType     Name
-----------     ----
Alias           % -> ForEach-Object
Alias           ? -> Where-Object
Alias           ac -> Add-Content
Alias           asnp -> Add-PSSnapin
Alias           cat -> Get-Content
Alias           cd -> Set-Location
Microsoft has some good doco on aliases here. You can define your own aliases for cmdlets. If you want to do something more complicated you'll need a function:
PS C:\> Function NetIFs {gwmi -class Win32_NetworkAdapter}
PS C:\> Set-Alias nif NetIFs

SQL-like querying, where "select *" doesn't mean everything

To query inside the class you can do this:
gwmi -query "Select * from Win32_NetworkAdapterConfiguration where IPEnabled=1"
There is a gotcha here however. The WMI query will return all the properties of the object, but powershell won't display them all. So you will be missing most of the properties and values listed in the documentation. To see all object properties and values, you can use the Select-Object cmdlet (alias select):
gwmi -query "Select * from Win32_NetworkAdapterConfiguration where IPEnabled=1" | select *
For advanced powershell examples, there's lots of good stuff on

Friday, May 30, 2014

Cross platform python method for calculating disk free space

I wanted a cross platform (Mac/Win/Linux) method for calculating disk free space. For my application ideally I'd like to: read it out of a file (or registry key) somewhere, failing that have a python library do it for me, and failing that shell out to something and parse the output. I couldn't find the free space in any files (including /proc), registry keys, or anywhere else file-like. So like this stackoverflow thread, I came to the conclusion that WMI for windows and os.statvfs for Mac+Linux were the best options.

First Windows, it's fairly straightforward. The MS doco for this WMI call is here, and it also explains the DriveType codes.
PS > $query = "select * from win32_logicaldisk"

PS > Get-WmiObject -Query $query

DeviceID     : C:
DriveType    : 3
ProviderName : 
FreeSpace    : 190249115648
Size         : 249690058752
VolumeName   : 

DeviceID     : Z:
DriveType    : 4
ProviderName : \\share\homedir\username
FreeSpace    : 15784280064
Size         : 26843545600
VolumeName   : nethomes$
Those numbers are bytes, so windows is pretty easy. On to the mess that is statvfs. First, the statfs man page gives a little history:
The original Linux statfs() and fstatfs() system calls were not designed with extremely large file sizes in mind. Subsequently, Linux 2.6 added new statfs64() and fstatfs64() system calls that employ a new structure, statfs64. The new structure contains the same fields as the original statfs structure, but the sizes of various fields are increased, to accommodate large file sizes. The glibc statfs() and fstatfs() wrapper functions transparently deal with the kernel differences. Some systems only have , other systems also have , where the former includes the latter. So it seems including the former is the best choice. LSB has deprecated the library calls statfs() and fstatfs() and tells us to use statvfs(2) and fstatvfs(2) instead.
Sounds like we should use statvfs and python has a os.statvfs so we should be good. Don't get fooled by this nasty deprecation notice, it's referring the the statvfs module which just defined a few constants. That's deprecated, but the os.statvfs function is alive and well in recent Python versions.

But wait, there's chatter about statvfs being dangerous on glibc systems and the df code said not to use it at some stage. Basically if you have a network filesystem listed in /proc/mounts and it is unreachable (e.g. because there is no network), statvfs will hang on stat'ing the network directory, even if you called statvfs on a completely different directory. df works around this by continuing to use statfs on glibc systems. I tested this with strace and it's true on my Ubuntu linux machine:
$ strace df
statfs("/usr/local/home/user", {f_type=0x65735546, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=1024, f_frsize=4096}) = 0
statfs("/nethome/user", {f_type="NFS_SUPER_MAGIC", f_bsize=8192, f_blocks=367001600, f_bfree=159547821, f_bavail=159547821, f_files=31876689, f_ffree=12707362, f_fsid={0, 0}, f_namelen=255, f_frsize=8192}) = 0
We can see that python os.statvfs is doing the same (and so is "stat -f"). So we should be safe using python's os.statvfs.
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0

# No statvfs calls
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statvfs
execve("/usr/bin/python", ["python", "-c", "import os;os.statvfs('/')"], [/* 56 vars */]) = 0

# stat -f does the same
$ strace stat -f / 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0
The next question is, how do you actually calculate the free space in bytes? Starting with: what is the block size? f_bsize is the "Preferred file system block size" and f_frsize is the "Fundamental file system block size" according to the python doco, and if you read the statfs man page it says "optimal transfer block size" and "fragment size (since Linux 2.6)" respectively. Confusing much?

On my linux machine they are the same:
In [6]: import os

In [7]: st = os.statvfs("/")

In [8]: st.f_bsize
Out[8]: 4096

In [9]: st.f_frsize
Out[9]: 4096

In [10]: !stat -f -c "Block size (for faster transfers): %s, Fundamental block size (for block counts): %S" /
Block size (for faster transfers): 4096, Fundamental block size (for block counts): 4096
On OS X they are not:
In [1]: import os

In [2]: st = os.statvfs("/")

In [3]: st.f_bsize
Out[3]: 1048576

In [4]: st.f_frsize
Out[4]: 4096
So on OS X f_bsize is 1MB, but that isn't actually the block size used by the filesystem, so using f_frsize looks like the best option for both platforms. The remaining sticking point is that pre-2.6-kernel linux machines don't have f_frsize, so we should check if it is zero and use f_bsize instead in that case.

OK so we have a blocksize, but what free size should we use? f_bfree is "free blocks in fs" and f_bavail is "free blocks available to unprivileged user". These can actually be quite different, e.g. mkfs.ext3 reserves 5% of the filesystem blocks for the super-user by default. Which one you care about probably depends on why you are measuring free disk space. In my case I chose f_bavail, (which is also what df reports).

The final product:
In [16]: def PrintFree(path):
   ....:     st = os.statvfs(path)
   ....:     if st.f_frsize:
   ....:         print "Free bytes: %s" % (st.f_frsize * st.f_bavail) 
   ....:     else:
   ....:         print "Free bytes: %s" % (st.f_bsize * st.f_bavail)

In [17]: PrintFree("/")
Free bytes: 127470809088

In [18]: !df -B 1
Filesystem                1B-blocks        Used    Available Use% Mounted on
/dev/sda1              153117560832 17845137408 127470809088  13% /

Wednesday, May 28, 2014

Mach-O filetype identification

I wanted to write a quick and dirty file-type identifier for Mach-O, turns out this is more tricky than I expected. From /usr/share/file/magic/mach:
# $File: mach,v 1.9 2009/09/19 16:28:10 christos Exp $
# Mach has two magic numbers, 0xcafebabe and 0xfeedface.
# Unfortunately the first, cafebabe, is shared with
# Java ByteCode, so they are both handled in the file "cafebabe".
# The "feedface" ones are handled herein.
and from /usr/share/file/magic/cafebabe:
# Since Java bytecode and Mach-O universal binaries have the same magic number, the test
# must be performed in the same "magic" sequence to get both right.  The long
# at offset 4 in a mach-O universal binary tells the number of architectures; the short at
# offset 4 in a Java bytecode file is the JVM minor version and the
# short at offset 6 is the JVM major version.  Since there are only 
# only 18 labeled Mach-O architectures at current, and the first released 
# Java class format was version 43.0, we can safely choose any number
# between 18 and 39 to test the number of architectures against
# (and use as a hack). Let's not use 18, because the Mach-O people
# might add another one or two as time goes by...
GAAAH! Unsurprisingly more than one engineer wanted to use the cutesy "cafebabe" for their magic string. I ended up using this regex, which will also match Java bytecode, but was good enough for my purpose:
The full Mach-O filetype doco is here. The various magic byte strings are as follows:
  • cefaedfe: Mach-O Little Endian (32-bit)
  • cffaedfe: Mach-O Little Endian (64-bit)
  • feedface: Mach-O Big Endian (32-bit)
  • feedfacf: Mach-O Big Endian (64-bit)
  • cafebabe: Universal Binary Big Endian. These fat binaries are archives that can include binaries for multiple architectures, but typically contain PowerPC and Intel x86.

Bash man page colours

There are many pages out there describing how to get coloured bash man pages. Tuxarena has one of the better ones that tries to explain what's going on, but unfortunately it is somewhat of a black art due to the obscure colour codes used. Here's the snippet from my bash_aliases that I use:
man() {
    env LESS_TERMCAP_mb=$'\E[01;31m' \
    LESS_TERMCAP_md=$'\E[01;38;5;74m' \
    LESS_TERMCAP_me=$'\E[0m' \
    LESS_TERMCAP_se=$'\E[0m' \
    LESS_TERMCAP_so=$'\E[01;41;33m' \
    LESS_TERMCAP_ue=$'\E[0m' \
    LESS_TERMCAP_us=$'\E[04;38;5;146m' \
    man "$@"
I only found one site that actually documented the color options available, and he basically had to reverse engineer it. I'll include the color codes below, since everyone likely has their own personal preference and wants to tweak things slightly.
0   = default colour
1   = bold
4   = underlined
5   = flashing text
7   = reverse field
31  = red
32  = green
33  = orange
34  = blue
35  = purple
36  = cyan
37  = grey
40  = black background
41  = red background
42  = green background
43  = orange background
44  = blue background
45  = purple background
46  = cyan background
47  = grey background
90  = dark grey
91  = light red
92  = light green
93  = yellow
94  = light blue
95  = light purple
96  = turquoise
100 = dark grey background
101 = light red background
102 = light green background
103 = yellow background
104 = light blue background
105 = light purple background
106 = turquoise background