Monday, October 27, 2014

Authenticode signing windows executables on linux

We had a particularly painful build and sign workflow that required multiple trips between linux and windows.  I looked around and found the following options for signing windows binaries on linux:
  • jsign, a java implementation
  • signcode from the Mono project, as suggested by Mozilla.  It's in the mono-devel ubuntu package.
  • osslsigncode, an Openssl-based implementation of authenticode signing that uses curl to make the timestamp requests.
The Mozilla instructions are good for getting your keys and certs into a format that will work with these tools. Some minor additions to those below:

openssl pkcs12 -in authenticode.pfx -nocerts -nodes -out key.pem
openssl rsa -in key.pem -outform PVK -pvk-strong -out authenticode.pvk
openssl pkcs12 -in authenticode.pfx -nokeys -nodes -out cert.pem
cat Thawte_Primary_Root_CA_Cross.cer >> cert.pem
openssl crl2pkcs7 -nocrl -certfile cert.pem -outform DER -out authenticode.spc
shred -u key.pem
Once you're done here you have authenticode.pvk with your encrypted private key, and authenticode.spc with your public certs. Appending the cross cert is necessary to make signature validation work with some tools. The windows GUI "Properties|Digital Signatures|Details" dialog will tell you "This digital signature is OK" but if you check with signtool verify on Windows, you'll find it isn't:
>"C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\signtool.exe" verify /v /kp my.exe

Verifying: my.exe


SignTool Error: Signing Cert does not chain to a Microsoft Root Cert.

Number of files successfully Verified: 0
Number of warnings: 0
Number of errors: 1
I suspect the GUI uses the local cert store and/or APIs that automatically fetch the required cross cert, but signtool and 3rd-party signature verifiers do not. With the cross cert added to the spc as above it can be correctly verified and mentions the MS cross cert:
Z:\signing\windows>"C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\signtool.exe" verify /v /kp my.exe

Verifying: my.exe


Cross Certificate Chain:
    Issued to: Microsoft Code Verification Root


Successfully verified: my.exe
If you use Bit9 it's also worth checking that it will verify your binary using the dascli.exe tool:
>"C:\Program Files (x86)\Bit9\Parity Agent\DasCLI.exe" certinfo my.exe
CertValidated[Y] Detached[N] Publisher[My Inc]

So, back to signing on Linux. At first I tried installing mono and using "signcode". It claims to succeed:
$ signcode sign -spc authenticode.spc -v authenticode.pvk -a sha1 -$ commercial -n MyApp -t -tr 5 my.exe
Mono SignCode - version
Sign assemblies and PE files using Authenticode(tm).
Copyright 2002, 2003 Motus Technologies. Copyright 2004-2008 Novell. BSD licensed.

Enter password for authenticode.pvk: 
And in the process echoes your password in cleartext!?! This is something I was prepared to fix with a "read -s -p 'Password'" wrapper script like this guy, but the signature was no good. I could see it appended in a hexeditor but Windows didn't give me a Digital Signature tab in the GUI and signtool couldn't find it either:
>"C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\signtool.exe" verify /v /kp my.exe

Verifying: my.exe
SignTool Error: No signature found.

Number of files successfully Verified: 0
Number of warnings: 0
Number of errors: 1
It's possible that there's something weird about our exe that caused this to fail. Someone else reported a similar problem but then later claimed it was due to a corrupted exe. In any case, not being particularly wedded to, or happy with, mono and signcode at this point I tried osslsigncode, which worked fine and produced a valid signature.
sudo apt-get install libcurl4-openssl-dev
sudo make install
osslsigncode sign -certs authenticode.spc -key authenticode.pvk -n "MyApp" -t -in my.exe -out my_signed.exe
Update: After coming across this mozilla post, I suspect my problem with mono's signcode was that signcode may not support 64 bit, but I didn't go back to check.

Monday, October 6, 2014

Python: add an element to an array only if a filter function returns True

This post was going to be about a fairly obscure feature of Python I found, but is now a minor cautionary tale about trying to be too clever :)

I was looking for an elegant solution to the problem of appending elements to a list only if a filter function returns true. The long (and in retrospect much more readable) way to write this is something like:
filter_result = self.FilterFunction(response)
if filter_result:
There is in fact a one-liner that can do this for you, but since its fairly obscure it makes the code much harder to understand.
processed_responses += filter(None, [self.FilterFunction(response)])
This works because when the first argument to filter is None, the effect is to remove all items from the sequence that evaluate to False. In this case that means if self.FilterFunction is False you get an empty array, and appending the empty array has no effect on processed_responses. If it's True, you append a single element.

Obvious huh?

Mocking out python OS specific imports

Testing python code that needs to run on a different OS is painful. A major part of the difficulty is that even though you can (somewhat) easily mock out the API calls used, you can't import the code successfully because the modules only exist on the target OS. Lets take an example of code that imports and calls functions from the win32api module. How do you mock out the import so you can test it on linux? I know of two main approaches.

One is the proxy module. Basically you define a module to hide all of the OS-specific imports behind, and do a conditional import in that module. So instead of having code like this:
import win32api
you do
import windows_imports
and then in windows_imports/
import platform

if platform.system() == "Windows":
  import win32api
  import winerror
  import wmi
Then inside your tests you need to create stubs to replace your API calls, e.g. for windows_imports.win32api.GetLogicalDriveStrings. Theoretically this should be fairly straightforward, but when I started down this path it got fairly complicated and I struggled to make it work. In the end I gave up and settled on the second approach, as below.

The second approach, described here, is to delay the import of the OS specific code in your tests until after you modify sys.modules to stub out all the OS-specific modules. This has the distinct advantage of leaving your production code untouched, and having all the complexity in your test code. Using the mock library makes this much easier. Below is an example of mocking out a WMI call made from python.
import mock
import test_fixture
import unittest

class WindowsTests(unittest.TestCase):

  def setUp(self):
    self.wmimock = mock.MagicMock()
    self.win32com = mock.MagicMock()
    self.win32com.client = mock.MagicMock()
    modules = {
        "_winreg": mock.MagicMock(),
        "pythoncom": mock.MagicMock(),
        "pywintypes": mock.MagicMock(),
        "win32api": mock.MagicMock(),
        "win32com": self.win32com,
        "win32com.client": self.win32com.client,
        "win32file": mock.MagicMock(),
        "win32service": mock.MagicMock(),
        "win32serviceutil": mock.MagicMock(),
        "winerror": mock.MagicMock(),
        "wmi": self.wmimock

    self.module_patcher = mock.patch.dict("sys.modules", modules)

    # Now we're ready to do the import
    from myrepo.actions import windows = windows

  def tearDown(self):

  def testEnumerateInterfaces(self):

    # Stub out wmi.WMI().Win32_NetworkAdapterConfiguration(IPEnabled=1)
    wmi_object = self.wmimock.WMI.return_value
    wmi_object.Win32_NetworkAdapterConfiguration.return_value = [

    enumif =
    interface_dict_list = list(enumif.RunWMIQuery())

HOWTO change the extension of lots of files while keeping the original filename

With rename you get all the power of perl regex substitution in a simple interface. e.g. to backup all the .yaml files in a directory you could use:
rename --no-act 's/(.*)\.yaml$/$1.yaml.bak/' *.yaml
Remove the --no-act to actually make the changes.

The next logical progression is to want to do this more than once, so adding a timestamp to the backup is desirable, but I couldn't think of a way to make rename do this (you can't put backticks in the regex for instance). So here's a workaround
find . -name *.yaml -exec mv {} {}.bak.`date +%Y-%m-%dT%H:%M:%S%z` \;

Friday, August 22, 2014

Powershell and WMI cheatsheet

I finally bit the bullet and spent a bit of time learning some powershell and WMI querying basics. Unfortunately lots of the tutorials and intros out there are missing key pieces that make it usable, like aliases. So here's my cheatsheet as an attempt to remedy that problem.

Listing Classes and Namespaces

First of all, figuring out what information is available is not that easy. It's in classes inside nested namespaces. There's good online documentation, but I'd prefer to get it from the tool itself where I can see the values on a live system. So, lets start with listing all the top-level WMI Namespaces.
PS C:\> Get-WmiObject -class __Namespace -namespace root | Format-Table name


Note that Namespaces can be nested, the following ones sit below root\cimv2 which is the default namespace for Get-WmiObject if you don't specify one. If you really want to see everything you're going to need to list them recursively.
PS C:\> Get-WmiObject -class __Namespace -namespace root\cimv2 | Format-Table name

To list all classes within a namespace (see here for a high-level description of the naming scheme for classes):
PS C:\> Get-WmiObject -List -namespace root\cimv2 | Select name
All classes with 'Net' in the name:
PS C:\> Get-WmiObject -List | Where-Object { $ -match 'net'} | Select Name

Retrieving Object Values

Retrieve the class values:
PS C:\> Get-WmiObject -class Win32_NetworkAdapterConfiguration

DHCPEnabled      : True
IPAddress        : {, abcd::eff:fff:aaaa:abdc}
DefaultIPGateway : {, abce::eff:feff:aaaa:265}
DNSDomain        :
ServiceName      : 
Description      : Intel(R) Gigabit Network Connection
Index            : 7

Aliases and Functions

Sick of typing 'Get-WmiObject' yet? Thankfully 'gwmi' is an alias for Get-WmiObject:
PS C:\> gwmi -class Win32_NetworkAdapterConfiguration
The list of built-in aliases can be retrieved with:
PS C:\> get-alias

CommandType     Name
-----------     ----
Alias           % -> ForEach-Object
Alias           ? -> Where-Object
Alias           ac -> Add-Content
Alias           asnp -> Add-PSSnapin
Alias           cat -> Get-Content
Alias           cd -> Set-Location
Microsoft has some good doco on aliases here. You can define your own aliases for cmdlets. If you want to do something more complicated you'll need a function:
PS C:\> Function NetIFs {gwmi -class Win32_NetworkAdapter}
PS C:\> Set-Alias nif NetIFs

SQL-like querying, where "select *" doesn't mean everything

To query inside the class you can do this:
gwmi -query "Select * from Win32_NetworkAdapterConfiguration where IPEnabled=1"
There is a gotcha here however. The WMI query will return all the properties of the object, but powershell won't display them all. So you will be missing most of the properties and values listed in the documentation. To see all object properties and values, you can use the Select-Object cmdlet (alias select):
gwmi -query "Select * from Win32_NetworkAdapterConfiguration where IPEnabled=1" | select *
For advanced powershell examples, there's lots of good stuff on

Friday, May 30, 2014

Cross platform python method for calculating disk free space

I wanted a cross platform (Mac/Win/Linux) method for calculating disk free space. For my application ideally I'd like to: read it out of a file (or registry key) somewhere, failing that have a python library do it for me, and failing that shell out to something and parse the output. I couldn't find the free space in any files (including /proc), registry keys, or anywhere else file-like. So like this stackoverflow thread, I came to the conclusion that WMI for windows and os.statvfs for Mac+Linux were the best options.

First Windows, it's fairly straightforward. The MS doco for this WMI call is here, and it also explains the DriveType codes.
PS > $query = "select * from win32_logicaldisk"

PS > Get-WmiObject -Query $query

DeviceID     : C:
DriveType    : 3
ProviderName : 
FreeSpace    : 190249115648
Size         : 249690058752
VolumeName   : 

DeviceID     : Z:
DriveType    : 4
ProviderName : \\share\homedir\username
FreeSpace    : 15784280064
Size         : 26843545600
VolumeName   : nethomes$
Those numbers are bytes, so windows is pretty easy. On to the mess that is statvfs. First, the statfs man page gives a little history:
The original Linux statfs() and fstatfs() system calls were not designed with extremely large file sizes in mind. Subsequently, Linux 2.6 added new statfs64() and fstatfs64() system calls that employ a new structure, statfs64. The new structure contains the same fields as the original statfs structure, but the sizes of various fields are increased, to accommodate large file sizes. The glibc statfs() and fstatfs() wrapper functions transparently deal with the kernel differences. Some systems only have , other systems also have , where the former includes the latter. So it seems including the former is the best choice. LSB has deprecated the library calls statfs() and fstatfs() and tells us to use statvfs(2) and fstatvfs(2) instead.
Sounds like we should use statvfs and python has a os.statvfs so we should be good. Don't get fooled by this nasty deprecation notice, it's referring the the statvfs module which just defined a few constants. That's deprecated, but the os.statvfs function is alive and well in recent Python versions.

But wait, there's chatter about statvfs being dangerous on glibc systems and the df code said not to use it at some stage. Basically if you have a network filesystem listed in /proc/mounts and it is unreachable (e.g. because there is no network), statvfs will hang on stat'ing the network directory, even if you called statvfs on a completely different directory. df works around this by continuing to use statfs on glibc systems. I tested this with strace and it's true on my Ubuntu linux machine:
$ strace df
statfs("/usr/local/home/user", {f_type=0x65735546, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=1024, f_frsize=4096}) = 0
statfs("/nethome/user", {f_type="NFS_SUPER_MAGIC", f_bsize=8192, f_blocks=367001600, f_bfree=159547821, f_bavail=159547821, f_files=31876689, f_ffree=12707362, f_fsid={0, 0}, f_namelen=255, f_frsize=8192}) = 0
We can see that python os.statvfs is doing the same (and so is "stat -f"). So we should be safe using python's os.statvfs.
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0

# No statvfs calls
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statvfs
execve("/usr/bin/python", ["python", "-c", "import os;os.statvfs('/')"], [/* 56 vars */]) = 0

# stat -f does the same
$ strace stat -f / 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0
The next question is, how do you actually calculate the free space in bytes? Starting with: what is the block size? f_bsize is the "Preferred file system block size" and f_frsize is the "Fundamental file system block size" according to the python doco, and if you read the statfs man page it says "optimal transfer block size" and "fragment size (since Linux 2.6)" respectively. Confusing much?

On my linux machine they are the same:
In [6]: import os

In [7]: st = os.statvfs("/")

In [8]: st.f_bsize
Out[8]: 4096

In [9]: st.f_frsize
Out[9]: 4096

In [10]: !stat -f -c "Block size (for faster transfers): %s, Fundamental block size (for block counts): %S" /
Block size (for faster transfers): 4096, Fundamental block size (for block counts): 4096
On OS X they are not:
In [1]: import os

In [2]: st = os.statvfs("/")

In [3]: st.f_bsize
Out[3]: 1048576

In [4]: st.f_frsize
Out[4]: 4096
So on OS X f_bsize is 1MB, but that isn't actually the block size used by the filesystem, so using f_frsize looks like the best option for both platforms. The remaining sticking point is that pre-2.6-kernel linux machines don't have f_frsize, so we should check if it is zero and use f_bsize instead in that case.

OK so we have a blocksize, but what free size should we use? f_bfree is "free blocks in fs" and f_bavail is "free blocks available to unprivileged user". These can actually be quite different, e.g. mkfs.ext3 reserves 5% of the filesystem blocks for the super-user by default. Which one you care about probably depends on why you are measuring free disk space. In my case I chose f_bavail, (which is also what df reports).

The final product:
In [16]: def PrintFree(path):
   ....:     st = os.statvfs(path)
   ....:     if st.f_frsize:
   ....:         print "Free bytes: %s" % (st.f_frsize * st.f_bavail) 
   ....:     else:
   ....:         print "Free bytes: %s" % (st.f_bsize * st.f_bavail)

In [17]: PrintFree("/")
Free bytes: 127470809088

In [18]: !df -B 1
Filesystem                1B-blocks        Used    Available Use% Mounted on
/dev/sda1              153117560832 17845137408 127470809088  13% /

Wednesday, May 28, 2014

Mach-O filetype identification

I wanted to write a quick and dirty file-type identifier for Mach-O, turns out this is more tricky than I expected. From /usr/share/file/magic/mach:
# $File: mach,v 1.9 2009/09/19 16:28:10 christos Exp $
# Mach has two magic numbers, 0xcafebabe and 0xfeedface.
# Unfortunately the first, cafebabe, is shared with
# Java ByteCode, so they are both handled in the file "cafebabe".
# The "feedface" ones are handled herein.
and from /usr/share/file/magic/cafebabe:
# Since Java bytecode and Mach-O universal binaries have the same magic number, the test
# must be performed in the same "magic" sequence to get both right.  The long
# at offset 4 in a mach-O universal binary tells the number of architectures; the short at
# offset 4 in a Java bytecode file is the JVM minor version and the
# short at offset 6 is the JVM major version.  Since there are only 
# only 18 labeled Mach-O architectures at current, and the first released 
# Java class format was version 43.0, we can safely choose any number
# between 18 and 39 to test the number of architectures against
# (and use as a hack). Let's not use 18, because the Mach-O people
# might add another one or two as time goes by...
GAAAH! Unsurprisingly more than one engineer wanted to use the cutesy "cafebabe" for their magic string. I ended up using this regex, which will also match Java bytecode, but was good enough for my purpose:
The full Mach-O filetype doco is here. The various magic byte strings are as follows:
  • cefaedfe: Mach-O Little Endian (32-bit)
  • cffaedfe: Mach-O Little Endian (64-bit)
  • feedface: Mach-O Big Endian (32-bit)
  • feedfacf: Mach-O Big Endian (64-bit)
  • cafebabe: Universal Binary Big Endian. These fat binaries are archives that can include binaries for multiple architectures, but typically contain PowerPC and Intel x86.