In 2015 I released my first Haskell project ruby-marshal. It’s a package that uses the binary package to parse Ruby objects serialised with Marshal.dump. I wrote it in my spare time because I was curious to know whether I could devise a strategy to incrementally migrate legacy Ruby on Rails applications over to Haskell without the risk associated with a full rewrite.
My hypothesis was that if I could decrypt and de-serialise Rails sessions then I’d be able to piggyback on the Rails application’s authentication mechanism. Not long after, I had the opportunity to use this package at work, and put this theory to the test, by writing a Haskell web application that shared sessions with Rails.
It has been running in production – without any issue – for almost two years.
Ruby’s Marshal library serialises Ruby objects to a bytestring e.g. dumping true
results in [4, 8, 84]
where 4
and 8
are the Marshal version number and true
is represented as 84
or ASCII T
.
Compound objects, e.g. hash maps, can also be serialised using Marshal.dump. This might explain why it was used as the default cookie serialiser in Rails until version 4.1, after which JSON serialisation became the default.
% irb
irb(main):001:0> x = Marshal.dump("session_id" => "ba0844151d")
=> "\x04\b{\x06I\"\x0Fsession_id\x06:\x06ETI\"\x0Fba0844151d\x06;\x00T"
irb(main):002:0> x.bytes
=> [4, 8, 123, 6, 73, 34, 15, 115, 101, 115, 115, 105, 111, 110, 95, 105, 100, 6, 58, 6, 69, 84, 73, 34, 15, 98, 97, 48, 56, 52, 52, 49, 53, 49, 100, 6, 59, 0, 84]
More information about the Marshal.dump binary format can be found in a series of blog posts by @jakegoulding or by reviewing the ruby-marshal source code.
The ruby-marshal package allows us to transform this binary format into Haskell values and follows a pattern you’ll see elsewhere in the Haskell ecosystem. It consists of:
The Ruby AST represents a subset of values that can be encoded by Marshal.dump.
data RubyObject
-- Simple objects.
= RNil
| RBool Bool
| RFixnum Int
| RFloat Float
| RString ByteString
| RSymbol ByteString
-- Compound objects.
| RArray (Vector RubyObject)
| RHash (Vector (RubyObject, RubyObject))
| RIVar (RubyObject, RubyStringEncoding)
-- Tag for unsupported objects e.g. Bignum.
| Unsupported
This is a common pattern you’ll see in other packages e.g. msgpack:Object and aeson:Value.
Parsers are combined to build an AST e.g. parsing a raw bytestring is defined as follows.
getString :: Marshal BS.ByteString
getString =
-- Label a parser to ensure label will be appended if the parse fails.
marshalLabel "RawString" $ do
-- Get the number of bytes in the bytestring.
n <- getFixnum
-- Get the number of bytes in the bytestring and lift it into the Marshal monad.
liftMarshal $ getBytes n
It is then used by other parsing functions e.g. parsing a Ruby symbol.
getSymbol :: Marshal BS.ByteString
getSymbol = marshalLabel "Symbol" $ do
-- Get bytestring.
x <- getString
-- Write symbol into the symbol cache.
writeCache $ RSymbol x
-- Return the bytestring.
return x
Before being used in the top level parsing function that combines parsing functions and lifts values in to the Ruby AST.
getRubyObject :: Marshal RubyObject
getRubyObject =
-- Make sure we're using a supported Marshal version, throw away the result and recursively parse our bytestring.
getMarshalVersion >> go
where
go :: Marshal RubyObject
go = liftMarshal getWord8 >>= \case
NilChar -> return RNil
TrueChar -> return $ RBool True
FalseChar -> return $ RBool False
FixnumChar -> RFixnum <$> getFixnum
FloatChar -> RFloat <$> getFloat
StringChar -> RString <$> getString
SymbolChar -> RSymbol <$> getSymbol
ObjectLinkChar -> RIVar <$> getObjectLink
SymlinkChar -> RSymbol <$> getSymlink
ArrayChar -> RArray <$> getArray go
HashChar -> RHash <$> getHash go go
IVarChar -> RIVar <$> getIVar go
_ -> return Unsupported
A quirk of the Marshal format is that it saves space by encoding repeated objects as indexes into a symbol cache and an object cache. We use StateT
to keep track of these during de-serialisation and enrich the underlying Get monad by creating a custom monad.
newtype Marshal a = Marshal { runMarshal :: StateT Cache Get a }
deriving (Functor, Applicative, Monad, MonadState Cache)
This allows us to write to and read from our cache during parsing without having to manually thread state through our parsing functions.
Let’s take a simple example of a Ruby string, serialise it and dump it to the file system using irb
.
% irb
irb(main):001:0> x = "hello haskell"
=> "hello haskell"
irb(main):002:0> y = Marshal.dump(x)
=> "\x04\bI\"\x12hello haskell\x06:\x06ET"
irb(main):003:0> File.open("example.bin", "w") { |z| z.write(y) }
Switching over to Haskell we set up our imports.
import Data.ByteString (ByteString)
import Data.Ruby.Marshal
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as Char8
Define a function to read our example from the file system.
Define a function that uses the Rubyable typeclass to convert a RubyObject to a more convenient representation.
toString :: RubyObject -> Maybe (ByteString, RubyStringEncoding)
toString rubyObject = fromRuby rubyObject
Before putting it all together to print the Ruby string to the console.
main :: IO ()
main = do
-- Read example.bin.
example <- readExample
-- Decode using ruby-marshal and maybe convert the result to a bytestring.
case decode example >>= toString of
-- Handle the case when serialisation fails.
Nothing ->
putStrLn "Oops, something went wrong..."
-- Throw away the encoding information and print the Ruby string to the console.
Just (string, _) ->
Char8.putStrLn string
Let’s take another example of de-serialising Ruby objects stored in memcache using the the dalli gem.
% irb
irb(main):001:0> require "dalli"
=> true
irb(main):002:0> dc = Dalli::Client.new("localhost:11211")
=> #<Dalli::Client:0x007fe4948b7358 @servers=["localhost:11211"], @options={}, @ring=nil>
irb(main):003:0> dc.set("str", "hello haskell")
We’ll reuse our existing Haskell code but add another import.
Define a function that creates a new memcache client.
createMemcacheClient :: IO M.Client
createMemcacheClient =
M.newClient [M.ServerSpec "localhost" 11211 M.NoAuth] M.def
Before putting it all together to pull the value out of memcache and print the Ruby string to the console.
main :: IO ()
main = do
-- Set up memcache client.
mc <- createMemcacheClient
-- Retrieve bytestring from memcache server.
example <- M.get mc "str"
-- Unpack result from memcache server.
case example of
Nothing ->
putStrLn "Oops, key not found..."
-- Pattern match to extract bytestring.
Just (value, _, _) ->
-- Decode using ruby-marshal and maybe convert the result to a bytestring.
case decode value >>= toString of
-- Handle the case when serialisation fails.
Nothing ->
putStrLn "Oops, something went wrong..."
-- Throw away the encoding information and print the Ruby string to the console.
Just (string, _) ->
Char8.putStrLn string
By writing the ruby-marshal package, I was able to create a Haskell web application that coexisted with a Rails application. This approach has been a success at work and appears to be one way in which you could gradually migrate an existing web application written in Ruby over to Haskell without the risk associated with a full rewrite.