Fork me on GitHub

reading binary files in haskell

In my previous post I parsed a simple text file. Parts of that textfile were paths that leads to binary files from nmr experiments. The format is ft2 and it is the result of processing time domain data from nmr experiments using bruker spectrometers. We use the nmrPipe for processing time domain data to frequency-domain data. The nmr data is from projection experiments a method that allows high dimensional nmr data to be recorded fast. The data structure is an array of 4 bytes floats with a header of 2048 bytes. The size of the data array is determined by the number of direct points and the number of indirect points, this can be seen in the prodecomp.txt file under the name SIZE. So, lets get on with the code:

 module ReadFT2 (
 listOfWord32,
 getl,
 kv2string,
 float2matrix,
 spec2flists,
 ) where

import Data.Map as Map
import Data.Maybe (isJust,fromJust)
import Data.Word
import Data.Bits
import Data.Binary.Get
import GHC.ST (runST)
import qualified Data.ByteString.Lazy as BL
import Data.List.Split (chunksOf)
import Data.ReinterpretCast
import Numeric.LinearAlgebra

kv2string :: String -> Int -> Map Int String -> String
kv2string error i xs
    | isJust $ Map.lookup i xs = fromJust $ Map.lookup i xs
    | otherwise = error

-- 2048 hardcoded
spec2flists :: Int -> BL.ByteString -> [[Float]]
spec2flists i spec = chunksOf i $ Prelude.map wordToFloat (runGet listOfWord32 $ BL.drop 2048 spec)

float2matrix :: [[Float]] -> Matrix Float
float2matrix xs = fromLists xs :: Matrix Float

getBinary :: Get Word32
getBinary = do
   a <- getWord32be
     return a

listOfWord32 :: Get [Word32]
listOfWord32 = do
 empty <- isEmpty
   if empty
        then return []
   else do v <- getWord32be
     rest <- listOfWord32
     return (v : rest)

getl :: [Word32] -> Int
getl sp = Prelude.length sp

We have to transfer our data to a Word32 type, and that is done with the listOfWord32 function. This function is called from spec2flists that stacks all data in a float array. This array of floats can then be converted to a Matrix type that we have imported from the hMatrix package. And from there on we can do all fancy calculations.

  • note that we use a mixture of in house functions and imported functions. maybe a trivial remark, but we should always look in the wast haskell lbrary for functoins that solves parts of our problem and then glue these together with our own functions.
  • the listOfWord function is the central function here, ones we have everthing in a Word32 format everything gets easier.
  • the getl function just tells us how many bytes we have read, just to check that everything went fine.
  • note that I've have hardcoded the header size of 2048 bytes. That is fine as long as nmrPipe always put 2048 in front of the data, but it should be improved later on

Next step is to do something with all these data. hopefully we will get to that in a future post.

Comments !