Fork me on GitHub

simple haskell parser

The functional language haskell is a very different computer language from other imperative languages like C, python and Java. It's firmly based on a mathematical foundation and it requires a different approach to solve common problems. The first thing that strikes most newcomers is that variables are not allowed to change their value during a computation, much like mathematical functions, so looping is forbidden. Instead of looping recursion is heavily used. Another thing is a strong static type system that catches a lot of errors. All this and more makes the code cleaner and less error prone. Here is a simple text parser that is applied to a prodecomp.txt file (more about prodecomp later).

import Data.List.Split
import qualified Data.Map as Map
import Data.Typeable

matchString :: String -> String -> Bool
matchString a x = elem a $ splitOn " " x

removeC :: [String] -> [String]
removeC xs = map (\x -> if (last x)==',' then init x else x) xs

removeP :: [String] -> [String]
removeP xs = map (\x -> if (head x)=='(' && (last x)==')' then init $ tail x else x) xs

getPath :: String -> [String] -> [(Int,String)]
getPath _ [] = []
getPath a (x:xs)
   | matchString a x = (read $ head $ splitOn "=" x :: Int,splitOn " " x !! 2) : getPath a xs
   | otherwise = getPath a xs

getParams :: String -> [String] -> [(Int,String)]
getParams _ [] = []
getParams a (x:xs)
   | matchString a x = zip [1..] (removeP $ removeC $ tail $ splitOn " " x)
   | otherwise = getParams a xs

getDefinitions :: String -> [String] -> [[String]]
getDefinitions _  [] = []
getDefinitions a (x:xs)
   | matchString a x = [filter (\d -> if d=="1" || d=="-1" || d=="0" then True else False) $ splitOneOf ", " x] ++ getDefinitions a xs
   | otherwise = getDefinitions a xs

The following text was parsed. This text is an input file for prodecomp2, a decomposition program that decomposes nmr projection experiments. This is my main post doc project at Gothenburg university.

FORMAT= FT2
NUCLEI= HN, N, CO, Cab, Hab, Ca, Cb, Ha, Hb
SW_ppm= (3.5), 30.000, 12.000, 75.000, 12.013, 75.000, 75.000, 12.013, 12.013
SW_hz= (9615.385), 2433.402, 2415.223, 15095.144, 9615.385, 15095.144, 15095.144, 9615.385, 9615.385
O1_ppm= (8.25), 118.000, 175.000, 39.000, 4.704, 39.000, 39.000, 4.696, 4.696
SIZE= (408), 192
1= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1010.ft2 \
DEFINITION: 1, 0, 0, 0, 1, 0, 0, 0, 0
2= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1011.ft2 \
DEFINITION: 1, 0, 0, 1, 0, 0, 0, 0, 0
3= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1012.ft2 \
DEFINITION: 1, 0, 0, 1, 1, 0, 0, 0, 0
4= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1013.ft2 \
DEFINITION: 1, 0, 0, 1, -1, 0, 0, 0, 0
5= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1014.ft2 \
DEFINITION: 1, 0, 1, 0, 0, 0, 0, 0, 0
6= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1015.ft2 \
DEFINITION: 1, 0, 1, 0, 1, 0, 0, 0, 0
7= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1016.ft2 \
DEFINITION: 1, 0, 1, 0, -1, 0, 0, 0, 0
8= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1017.ft2 \
DEFINITION: 1, 0, 1, 1, 0, 0, 0, 0, 0
9= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1018.ft2 \
DEFINITION: 1, 0, 1, -1, 0, 0, 0, 0, 0
10= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1019.ft2 \
DEFINITION: 1, 0, 1, 1, 1, 0, 0, 0, 0
11= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1020.ft2 \
DEFINITION: 1, 0, 1, -1, 1, 0, 0, 0, 0
12= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1021.ft2 \
DEFINITION: 1, 0, 1, 1, -1, 0, 0, 0, 0
13= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1022.ft2 \
DEFINITION: 1, 0, 1, -1, -1, 0, 0, 0, 0
14= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1023.ft2 \
DEFINITION: 1, 1, 0, 0, 0, 0, 0, 0, 0
15= PATH: /home/jonas/Dropbox/nmrProg/development/src/prodecomp2/FT2_5D/s1024.ft2 \
DEFINITION: 1, 1, 0, 0, 1, 0, 0, 0, 0

I've deliberately cut the file, there is and additional 40 entries that are identical as the first 15 paths. I am not going to go through the whole parser, just highlight some interesting parts:

  • most functions are small utility functions that have two lines, one is the type and the other one is the actual one line function. These are then used to glue together larger functions. This is a typical pattern in haskell, breakdown the problem in small sub problems, solve each one and then combine to solve a larger problem. matchString, removeC and removeP are examples of utility functions that matches a string, removes comma and removes paranthesis.
  • when developing functions we try to find a common pattern for our functions in order to make them as generall as possible thus avoiding overlapping code. In the above code, a seasoned haskell programmer could probably merge several functions to one.
  • lazy evaluation means that only parts that are necessary will be evaluated. I use this when making my key value pairs, I don't have to know beforehand how long my list will be.
  • looping is always done via recursive functions, creating a base case and some logical guards is usually what it takes.
  • the above code extracts all parameters in the text file in the [(key,value)] format, these lists can then be assigned and transformed via the Map.fromList function.

What is missing is a string to float converter, but that will be added under the main file.

Comments !