Friday, May 30, 2014

Cross platform python method for calculating disk free space

I wanted a cross platform (Mac/Win/Linux) method for calculating disk free space. For my application ideally I'd like to: read it out of a file (or registry key) somewhere, failing that have a python library do it for me, and failing that shell out to something and parse the output. I couldn't find the free space in any files (including /proc), registry keys, or anywhere else file-like. So like this stackoverflow thread, I came to the conclusion that WMI for windows and os.statvfs for Mac+Linux were the best options.

First Windows, it's fairly straightforward. The MS doco for this WMI call is here, and it also explains the DriveType codes.
PS > $query = "select * from win32_logicaldisk"

PS > Get-WmiObject -Query $query


DeviceID     : C:
DriveType    : 3
ProviderName : 
FreeSpace    : 190249115648
Size         : 249690058752
VolumeName   : 

DeviceID     : Z:
DriveType    : 4
ProviderName : \\share\homedir\username
FreeSpace    : 15784280064
Size         : 26843545600
VolumeName   : nethomes$
Those numbers are bytes, so windows is pretty easy. On to the mess that is statvfs. First, the statfs man page gives a little history:
The original Linux statfs() and fstatfs() system calls were not designed with extremely large file sizes in mind. Subsequently, Linux 2.6 added new statfs64() and fstatfs64() system calls that employ a new structure, statfs64. The new structure contains the same fields as the original statfs structure, but the sizes of various fields are increased, to accommodate large file sizes. The glibc statfs() and fstatfs() wrapper functions transparently deal with the kernel differences. Some systems only have , other systems also have , where the former includes the latter. So it seems including the former is the best choice. LSB has deprecated the library calls statfs() and fstatfs() and tells us to use statvfs(2) and fstatvfs(2) instead.
Sounds like we should use statvfs and python has a os.statvfs so we should be good. Don't get fooled by this nasty deprecation notice, it's referring the the statvfs module which just defined a few constants. That's deprecated, but the os.statvfs function is alive and well in recent Python versions.

But wait, there's chatter about statvfs being dangerous on glibc systems and the df code said not to use it at some stage. Basically if you have a network filesystem listed in /proc/mounts and it is unreachable (e.g. because there is no network), statvfs will hang on stat'ing the network directory, even if you called statvfs on a completely different directory. df works around this by continuing to use statfs on glibc systems. I tested this with strace and it's true on my Ubuntu linux machine:
$ strace df
[snip]
statfs("/usr/local/home/user", {f_type=0x65735546, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=1024, f_frsize=4096}) = 0
statfs("/nethome/user", {f_type="NFS_SUPER_MAGIC", f_bsize=8192, f_blocks=367001600, f_bfree=159547821, f_bavail=159547821, f_files=31876689, f_ffree=12707362, f_fsid={0, 0}, f_namelen=255, f_frsize=8192}) = 0
[snip]
We can see that python os.statvfs is doing the same (and so is "stat -f"). So we should be safe using python's os.statvfs.
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0

# No statvfs calls
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statvfs
execve("/usr/bin/python", ["python", "-c", "import os;os.statvfs('/')"], [/* 56 vars */]) = 0

# stat -f does the same
$ strace stat -f / 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0
The next question is, how do you actually calculate the free space in bytes? Starting with: what is the block size? f_bsize is the "Preferred file system block size" and f_frsize is the "Fundamental file system block size" according to the python doco, and if you read the statfs man page it says "optimal transfer block size" and "fragment size (since Linux 2.6)" respectively. Confusing much?

On my linux machine they are the same:
In [6]: import os

In [7]: st = os.statvfs("/")

In [8]: st.f_bsize
Out[8]: 4096

In [9]: st.f_frsize
Out[9]: 4096

In [10]: !stat -f -c "Block size (for faster transfers): %s, Fundamental block size (for block counts): %S" /
Block size (for faster transfers): 4096, Fundamental block size (for block counts): 4096
On OS X they are not:
In [1]: import os

In [2]: st = os.statvfs("/")

In [3]: st.f_bsize
Out[3]: 1048576

In [4]: st.f_frsize
Out[4]: 4096
So on OS X f_bsize is 1MB, but that isn't actually the block size used by the filesystem, so using f_frsize looks like the best option for both platforms. The remaining sticking point is that pre-2.6-kernel linux machines don't have f_frsize, so we should check if it is zero and use f_bsize instead in that case.

OK so we have a blocksize, but what free size should we use? f_bfree is "free blocks in fs" and f_bavail is "free blocks available to unprivileged user". These can actually be quite different, e.g. mkfs.ext3 reserves 5% of the filesystem blocks for the super-user by default. Which one you care about probably depends on why you are measuring free disk space. In my case I chose f_bavail, (which is also what df reports).

The final product:
In [16]: def PrintFree(path):
   ....:     st = os.statvfs(path)
   ....:     if st.f_frsize:
   ....:         print "Free bytes: %s" % (st.f_frsize * st.f_bavail) 
   ....:     else:
   ....:         print "Free bytes: %s" % (st.f_bsize * st.f_bavail)
   ....:         

In [17]: PrintFree("/")
Free bytes: 127470809088

In [18]: !df -B 1
Filesystem                1B-blocks        Used    Available Use% Mounted on
/dev/sda1              153117560832 17845137408 127470809088  13% /

Wednesday, May 28, 2014

Mach-O filetype identification

I wanted to write a quick and dirty file-type identifier for Mach-O, turns out this is more tricky than I expected. From /usr/share/file/magic/mach:
# $File: mach,v 1.9 2009/09/19 16:28:10 christos Exp $
# Mach has two magic numbers, 0xcafebabe and 0xfeedface.
# Unfortunately the first, cafebabe, is shared with
# Java ByteCode, so they are both handled in the file "cafebabe".
# The "feedface" ones are handled herein.
and from /usr/share/file/magic/cafebabe:
# Since Java bytecode and Mach-O universal binaries have the same magic number, the test
# must be performed in the same "magic" sequence to get both right.  The long
# at offset 4 in a mach-O universal binary tells the number of architectures; the short at
# offset 4 in a Java bytecode file is the JVM minor version and the
# short at offset 6 is the JVM major version.  Since there are only 
# only 18 labeled Mach-O architectures at current, and the first released 
# Java class format was version 43.0, we can safely choose any number
# between 18 and 39 to test the number of architectures against
# (and use as a hack). Let's not use 18, because the Mach-O people
# might add another one or two as time goes by...
GAAAH! Unsurprisingly more than one engineer wanted to use the cutesy "cafebabe" for their magic string. I ended up using this regex, which will also match Java bytecode, but was good enough for my purpose:
^(cffaedfe|cefaedfe|feedface|feedfacf|cafebabe)
The full Mach-O filetype doco is here. The various magic byte strings are as follows:
  • cefaedfe: Mach-O Little Endian (32-bit)
  • cffaedfe: Mach-O Little Endian (64-bit)
  • feedface: Mach-O Big Endian (32-bit)
  • feedfacf: Mach-O Big Endian (64-bit)
  • cafebabe: Universal Binary Big Endian. These fat binaries are archives that can include binaries for multiple architectures, but typically contain PowerPC and Intel x86.

Bash man page colours

There are many pages out there describing how to get coloured bash man pages. Tuxarena has one of the better ones that tries to explain what's going on, but unfortunately it is somewhat of a black art due to the obscure colour codes used. Here's the snippet from my bash_aliases that I use:
man() {
    env LESS_TERMCAP_mb=$'\E[01;31m' \
    LESS_TERMCAP_md=$'\E[01;38;5;74m' \
    LESS_TERMCAP_me=$'\E[0m' \
    LESS_TERMCAP_se=$'\E[0m' \
    LESS_TERMCAP_so=$'\E[01;41;33m' \
    LESS_TERMCAP_ue=$'\E[0m' \
    LESS_TERMCAP_us=$'\E[04;38;5;146m' \
    man "$@"
}
I only found one site that actually documented the color options available, and he basically had to reverse engineer it. I'll include the color codes below, since everyone likely has their own personal preference and wants to tweak things slightly.
0   = default colour
1   = bold
4   = underlined
5   = flashing text
7   = reverse field
31  = red
32  = green
33  = orange
34  = blue
35  = purple
36  = cyan
37  = grey
40  = black background
41  = red background
42  = green background
43  = orange background
44  = blue background
45  = purple background
46  = cyan background
47  = grey background
90  = dark grey
91  = light red
92  = light green
93  = yellow
94  = light blue
95  = light purple
96  = turquoise
100 = dark grey background
101 = light red background
102 = light green background
103 = yellow background
104 = light blue background
105 = light purple background
106 = turquoise background

Wednesday, May 14, 2014

Python Mix-ins

Python mix-ins are a handy way to augment functionality of a class.  LinuxJournal has a good detailed article about them.  To mix in a class dynamically you just need to modify the __bases__ class attribute:

class Base(object):pass
class BaseClass(Base):pass
class MixInClass(object):pass

BaseClass.__bases__ += (MixInClass,)

Note that you seem to only be able to do thsi if the base class doesn't inherit directly from object, hence the extra "Base" class above. This is what the failure looks like:
In [10]: class BaseClass(object):pass

In [11]: class MixInClass(object):pass

In [12]: BaseClass.__bases__ += (MixInClass,)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
----> 1 BaseClass.__bases__ += (MixInClass,)

TypeError: Cannot create a consistent method resolution
order (MRO) for bases MixInClass, object
Also note that unless both classes call super in their __init__ you will have problems with initialization. It is also possible to mix in a class to an object instance, instead of a class, as mentioned on Stack Overflow like this:
obj = BaseClass()
obj.__class__ = type('MixInClass',(BaseClass,MixInClass),{})

Monday, May 5, 2014

Don't use == for comparing secrets

TIL: You shouldn't use == to compare HMACs, or anything sensitive really. Doing so creates a timing side channel that can reveal the secret to an attacker. Instead you need to use a comparison function that takes a constant amount of time for all values, not matter how similar they are to the actual HMAC. The python example given in the article is:

def is_equal(a, b):
  if len(a) != len(b):
      return False

  result = 0
  for x, y in zip(a, b):
      result |= x ^ y
  return result == 0
This function is available in python 3.3+ as:
hmac.compare_digest(a, b)

Tuesday, February 11, 2014

Antennas for free-to-air broadcast TV in the US (i.e. how to watch the olympics for free)

Watching the winter and summer olympics is literally the only reason I have to watch TV these days, and since the content is so locked up in media deals, buying a cable TV package seems to be the only option to see it in the US.  Not so.

NBC (which carries the olympics) is broadcast free to air in HD, along with a few other major channels and a pile of crap that isn't fit for human consumption (like 24hr home shopping).  Getting at this free content is basically a case of picking the right antenna.

1. Check available channels

Put your zipcode into the station direction finder at antennaweb.org to find out what channels are available in your area and where the TV transmitters are.

2. Buy an antenna

For best results you should probably use a high-gain outdoors antenna.  But if you are renting, or want to do it on the cheap, you might want to check out Lifehacker's list of the best indoor antennas. In the SF bay area I can tune 82 channels, including NBC aka KNTV, with the un-amplified Mohu leaf which cost me $40.  The leaf also happens to be the top recommendation from Lifehacker.

I have it mounted indoors about 9ft up the wall and I'm receiving all the channels listed by antennaweb.  I even get KTLN, which antennaweb says is being broadcast from 57 miles away!  The antennaweb recommended antenna to receive at that kind of distance is a large directional antenna with pre-amp, but the Mohu leaf is killing it.  Amazing.

3. Tune channels

Follow your TV's instructions for tuning channels.  You may have to move your antenna around to find the best reception.

Wednesday, January 1, 2014

NSA's ANT exploitation catalog

The latest Snowden disclosure of the NSA's ANT exploitation catalog will be studied by every IT security professional in the world. It's a lot to take in, so I wrote a quick summary here. It includes descriptions of:
  • BIOS-based implants for common routers (Huawei, Cisco), firewalls (Huawei, Juniper, Cisco) and servers (HP and Dell)
  • iPhone implant
  • Room audio capture chip ("bug")
  • 802.11 injection hardware
  • SIM card implants
  • Phones with software-defined-radio for covert wireless survey and capture
  • A PCI hardware implant
  • Wireless chips for airgap jumping (HOWLERMONKEY)
  • Hard drive firmware implant
  • Software implants that route traffic to unused 802.11 interfaces (i.e. exfil even while wireless is "off")
  • Multi-OS BIOS/HPA implant
  • Hardware keylogger chip with RF exfil
  • Implanted GSM handsets
  • Thuraya sat phone handset hardware implant
  • Windows mobile implant
  • GSM basestations that can find targets based on handset IDs, collect and capture voice/data/SMS etc.
  • Sofware defined radio direction finders for tracking targets based on a wide range of emissions
  • Modified USB cables with RF chips for airgap bridging (COTTONMOUTH-I,II,III)
  • Ethernet hardware RJ45 connector implant that can do traffic filtering and injection with comms over RF (FIREWALK)
  • VGA cable with hardware implant that collects video and exfils over RF (RAGEMASTER)