gsoc-tcpregression: Bailed on previous idea

Also of note, I bailed on the idea from last night (and before) of using raw IP sockets to do the sending. The idea was to avoid needing to ever touch the physical or IP layer, thus removing some amount of work (e.g., no setting of the IP length fields, IP checksum calculation).

However, using raw sockets has the caveat of not providing a receive buffer. If using raw IP sockets, one cannot override the 'protocol' field to be IPPROTO_TCP. Haven't put much more thought into using raw TCP sockets.

Short Rant on pcs.Field
The class pcs.Field is used as a base class to represent any field inside of a packet. It provides bit-width flexibility, and the ability to encode/decode values of arbitrary bit-width. This is very useful when defining the various fields of a packet, as you can specify their exact size, and then with the pcs.Layout class, specify their exact order.

However, in order to string these two things together, you get somewhat of a kluge. Consider the following code in tcp.py:

     def  __init__(self, bytes = None ):
          """initialize a TCP packet""" 
         sport = pcs.Field("sport" , 16 ) 
         dport = pcs.Field("dport" , 16 )

And then the corresponding code to manipulate those fields...

>>> from pcs.packets.tcp import tcp
>>> t = tcp()
>>> type(t.offset)
<type 'int'>
>>> type(t.sport)
<type 'int'>

Okay, great. In the __init__ method, they were working on local variables with the same name as the variables to be used. The pcs.Field objects are supplied with a name to correspond to the object property (e.g. the pcs.Field object with (... name="sport",...) corresponds to the actual field "pcs.Field.sport". Great.

However, the approach falls short here. pcs.Field provides no way of setting the bit-order of whatever field it is manipulating, so users are stuck doing this:

t.sport = socket.htons(80)
# Instead of
t.sport = 80

And having the "magic" just happen. This prevents simple comparisons, as well. Which of the following is easier to read?

if ntohs(t.sport) == 80: ...
if t.sport == htons(80): ...
if t.sport == 20480: ...
if t.sport == 80: ...

This is less trivially simple for fields that may have more than one representation. This was discussed in a previous post (IIRC) with relation to IP and Hardware addresses. Consider the following. The IP address "127.0.0.1" can be represented as:

16777343 (host-byte-order)
2130706433 (network-byte-order)
'\x01\x00\x00\x7f' (host-byte-order byte-string)
'\x7f\x00\x00\x01' (network-byte-order byte-string)
And of course, "127.0.0.1"

Obviously, this makes things a bit difficult to work with. However, the current method does not just use network-byte order. An IP address is a NBO long. A port is a NBO short. An ethernet address is a NBO byte-string. Figuring out what the proper internal representation for everything is a pain, and translating between HBO and NBO makes it worse.

The simple solution is to allow multiple ways of getting and setting a field. Example using ports (also works with IP addresses, ethernet addresses):

>>> t.sport.setAscii('80')
>>> t.sport.setInteger(80)
>>> t.sport.setNetworkInteger(20480)
>>> t.sport.setBytes('P\x00\x00\x00')
>>> t.sport.setNetworkBytes('\x00\x00\x00P')
>>> t.sport == 80
True
>>> t.sport.getInteger() == 80
True
>>> t.sport == '80'
True
>>> t.sport == 'P\x00\x00\x00'
True
>>> t.sport == '\x00\x00\x00P'
False
>>> t.sport.getNetworkBytes() == '\x00\x00\x00P'
True

This allows us all the flexibility in the world that is needed. Assuming that all of the different types support the base functions (get/set for ASCII, Integer, and Bytes in both host- and network-byte order), everything is quickly extendible.

Visiting the original code, things have improved slightly. No longer do we need to worry about bit-ordering.

# Old
t.sport = htons(80)
t.sport = htons(str('80'))
t.sport = struct.unpack('!L','\x00\x00\x00P')
# New
t.sport.setInteger(80)
t.sport.setAscii('80')
t.sport.setNetworkBytes('\x00\x00\x00P')

Alright, now we face the problem of making the encoding 'cleaner'. Encoding things is extremely simple if everything is aligned on an 8-bit byte boundary. One could simply use "t.sport.getNetworkBytes() + t.dport.getNetworkBytes()" ad infinitum. However, that approach is ugly. Overriding __add__() with some gentle application of isinstance() and you can get the bytes pretty easily (even without 8-bit byte boundaries).

Will do some work tonight on this front. Will make for a very worthwhile patch to PCS.

[/rant]

gsoc-tcpregression

Saturday, June 13, 2009

Bailed on previous idea

No comments:

Post a Comment

Followers

Blog Archive

About Me