Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
Last week we prepared ourselves for parsing by going over the basics of the Gherkin Syntax. In this article and the next, weāll be using the applicative parsing library to parse that syntax. This week, weāll focus on the fundamentals of this library, and build up a vocabulary of combinators to use. Weāll make heavy use of the Applicative typeclass. If you need a refresher on that, check out this article. As we start coding, you can also follow along with the examples on Github here! Most of the code here is in Parser.hs.
In the coming weeks, weāll be seeing a couple other parsing libraries as well. If you want to get some ideas about these and more, download our Production Checklist. It summarizes many other useful libraries for writing higher levelĀ Haskell.
If youāve never started writing Haskell, nowās your chance! Get our free Beginnerās Checklist and learn the basics of gettingĀ started!
Getting Started
So to start parsing, letās make some notes about our input format. First, weāll treat our input feature document as a single string. Weāll remove all empty lines, and then trim leading and trailing whitespace from eachĀ line.
parseFeatureFromFile :: FilePath -> IO FeatureparseFeatureFromFile inputFile = do fileContents <- lines <$> readFile inputFile let nonEmptyLines = filter (not . isEmpty) fileContents let trimmedLines = map trim nonEmptyLines let finalString = unlines trimmedLines case parseFeature finalString of ...
ā¦isEmpty :: String -> BoolisEmpty = all isSpace
trim :: String -> Stringtrim input = reverse flippedTrimmed where trimStart = dropWhile isSpace input flipped = reverse trimStart flippedTrimmed = dropWhile isSpace flipped
This means a few things for our syntax. First, we donāt care about indentation. Second, we ignore extra lines. This means our parsers might allow certain formats we donāt want. But thatās OK because weāre trying to keep thingsĀ simple.
The REĀ Type
With applicative based parsing, the main data type weāll be working with is called RE, for regular expression. This represents a parser, and itās parameterized by twoĀ types:
data RE s a = ...
The s type refers to the fundamental unit weāll be parsing. Since we're parsing our input as a single String, this will be Char. Then the a type is the result of the parsing element. This varies from parser to parser. The most basic combinator we can use is sym. This parses a single symbol of your choosing:
sym :: s - > RE s s
parseLowercaseA :: RE Char CharparseLowercaseA = sym āaā
To use an RE parser, we call the match function or its infix equivalent =~. These will return a Justvalue if we can match the entire input string, and Nothing otherwise:
>> match parseLowercaseA āaāJust āaā>> ābā =~ parseLowercaseANothing>> āabā =~ parseLowercaseANothing -- (Needs to parse entire input)
Predicates andĀ Strings
Naturally, weāll want some more complicated functionality. Instead of parsing a single input character, we can parse any character that fits a particular predicate by using psym. So if we want to read any character that was not a newline, we couldĀ do:
parseNonNewline :: RE Char CharparseNonNewline = psym (/= ā\nā)
The string combinator allows us to match a particular full string and then returnĀ it:
readFeatureWord :: RE Char StringreadFeatureWord = string āFeatureā
Weāll use this for parsing keywords, though weāll often end up discarding the āresultā.
Applicative Combinators
Now the RE type is applicative. This means we can apply all kinds of applicative combinators over it. One of these is many, which allows us to apply a single parser several times. Here is one combinator that weāll use a lot. It allows us to read everything up until a newline and return the resulting string:
readUntilEndOfLine :: RE Char StringreadUntilEndOfLine = many (psym (/= '\n'))
Beyond this, weāll want to make use of the applicative <*> operator to combine different parsers. We can also apply a pure function (or constructor) on top of those by using <$>. Suppose we have a data type that stores two characters. Hereās how we can build a parser forĀ it:
data TwoChars = TwoChars Char Char
parseTwoChars :: RE Char TwoCharsparseTwoChars = TwoChars <$> parseNonNewline <*> parseNonNewline
...
>> match parseTwoChars āabāJust (TwoChars āaā ābā)
We can also use <* and *>, which are cousins of the main applicative operator. The first one will parse but then ignore the right hand parse result. The second discards the left sideĀ result.
parseFirst :: RE Char CharparseFirst = parseNonNewline <* parseNonNewline
parseSecond :: RE Char CharparseSecond = parseNonNewline *> parseNonnewline
ā¦
>> match parseFirst āabāJust āaā>> match parseSecond āabāJust ābā>> match parseFirst āaāNothing
Notice the last one fails because the parser needs to have both inputs! Weāll come back to this idea of failure in a second. But now that we know this technique, we can write a couple other usefulĀ parsers:
readThroughEndOfLine :: RE Char StringreadThroughEndOfLine = readUntilEndOfLine <* sym '\n'
readThroughBar :: RE Char StringreadThroughBar = readUntilBar <* sym '|'
readUntilBar :: RE Char StringreadUntilBar = many (psym (\c -> c /= '|' && c /= '\n'))
The first will parse the rest of the line and then consume the newline character itself. The other parsers accomplish this same task, except with the vertical bar character. Weāll need these when we parse the Examples section nextĀ week.
Alternatives: Dealing with ParseĀ Failures
We introduced the notion of a parser āfailingā up above. Of course, we need to be able to offer alternatives when a parser fails! Otherwise our language will be very limited in its structure. Luckily, the RE type also implements Alternative. This means we can use the <|> operator to determine an alternative parser when one fails. Letās see this inĀ action:
parseFeatureTitle :: RE Char StringparseFeatureTitle = string āFeature: ā *> readThroughEndOfLine
parseScenarioTitle :: RE Char StringparseScenarioTitle = string āScenario: ā *> readThroughEndOfLine
parseEither :: RE Char StringparseEither = parseFeatureTitle <|> parseScenarioTitle
ā¦
>> match parseFeatureTitle āFeature: Login\nāJust āLoginā>> match parseFeatureTitle āScenario: Login\nāNothing>> match parseEither āScenario: Login\nāJust āLoginā
Of course, if ALL the options fail, then weāll still have a failingĀ parser!
>> match parseEither āRandom: Login\nāNothing
Weāll need this to introduce some level of choice into our parsing system. For instance, itās up to the user if they want to include a Background as part of their feature. So we need to be able to read the background if itās there or else move onto parsing a scenario.
Cconclusion
That wraps up our introduction to the basic combinators of applicative parsing. Next week, weāll take all the pieces weāve developed here and put them to work on Gherkin syntax itself. Everything seems pretty small so far. But weāll see that we can actually build up our results very rapidly once we have the basic pieces inĀ place!
If you want to see some more libraries that are useful for important Haskell tasks, take a look at our Production Checklist. It will introduce you to some libraries for parsing, databases, APIs, and muchĀ more!
If youāre new to Haskell, thereās no better time to start! Download our free Beginners Checklist! It will help you download the right tools and start learning the language.
Applicative Parsing I: Building the Foundation was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.