The value of composable parts
I’ve found myself in some strange parsing tasks lately. This is a new thing for me, so don’t take this post as an example of the best practices for parsing. However, FWIW, all the parsers work.
The Setup
Say we have data that looks something like:
"1%400:3.2 6%some_description|100:1"
First we decide what we’re trying to pull out of this. These values happen to
be space separated so we can just use the Prelude’s words
words theString
> ["1%400:3.2", "6%some_description|100:1"]
Each string in this list we’ll call a Feature
so we write a data type for it:
data Feature
= Feature
row :: String
{ col :: String
, value :: String
, descriptor :: Maybe String
,deriving (Show) }
Notice that we’re just reading this in as String data at the moment, but we can easily change that once we get the parsing structure down.
Anyway, almost done with the easy stuff. We need to pull the garbage data out somehow. That’s cool, we’ll just write out our signal matchers.
= string "%"
breakSep = string ":"
kvSep = string "|" descriptionSep
The Actual Parsing
Since we’ll be slurping up data until we hit one of the above defined separators, we’ll make a parser to do just that:
anythingUntil :: Parser String -> Parser String
= manyTill anyToken (p *> return ()) anythingUntil p
This function eats up any type of input until it hits one of our separators and returns everything before it.
The way we’ll use this is pretty simple
featureP :: Parser Feature
= do
featureP <- anythingUntil breakSep
row <- descriptorP
desc <- anythingUntil kvSep
col <- manyTill anyToken eof -- get the remaining
value return $ Feature row col value desc
Now we need to fill in the optional descriptor
parser
descriptorP :: Parser (Maybe String)
= optionMaybe $ try $ anythingUntil descSep descriptorP
optionMaybe
allows us to optionally consume some data and return a Maybe value.
Since the anythingUntil parser can fail in this case, we need to use
try
to save us from erroring out.
The Benefit over X
Personally, I find this easier to reason about than a regex or generic string functions. The point here is that I can easily expand on this and add new detailed parsers (This will be covered in part 2)
I’ve included a snapshot of the ihaskell session I was working in for full context here