Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
In last weekâs article we completed our look at the Applicative Parsing library. We took all our smaller combinators and put them together to parse our Gherkin syntax. This week, weâll look at a new library: Attoparsec. Instead of trying to do everything using a purely applicative structure, this library uses a monadic approach. This approach is much more common. It results in syntax that is simpler to read and understand. It will also make it easier for us to add certain features.
To follow along with the code for this article, take a look at the attoparsec branch on Github! For some more excellent ideas about useful libraries, download our Production Checklist! It includes material on libraries for everything from data structures to machine learning!
If youâre new to Haskell, make sure you download our Beginnerâs Checklist! Itâll tell you about all the steps you need to take to get started on your Haskell journey!
The Parser Type
In applicative parsing, all our parsers had the type RE Char. This type belonged to the Applicative typeclass but was not a Monad. For Attoparsec, weâll instead be using the Parsertype, a full monad. So in general weâll be writing parsers with the following types:
featureParser :: Parser FeaturescenarioParser :: Parser ScenariostatementParser :: Parser StatementexampleTableParser :: Parser ExampleTablevalueParser :: Parser Value
Parsing Values
The first thing we should realize though is that our parser is still an Applicative! So not everything needs to change! We can still make use of operators like *> and <|>. In fact, we can leave our value parsing code almost exactly the same! For instance, the valueParser, nullParser, and boolParser expressions can remain the same:
valueParser :: Parser ValuevalueParser = nullParser <|> boolParser <|> numberParser <|> stringParser
nullParser :: Parser ValuenullParser = (string "null" <|> string "NULL" <|> string "Null") *> pure ValueNull
boolParser :: Parser ValueboolParser = (trueParser *> pure (ValueBool True)) <|> (falseParser *> pure (ValueBool False)) where trueParser = string "True" <|> string "true" <|> string "TRUE" falseParser = string "False" <|> string "false" <|> string "FALSE"
If we wanted, we could make these more âmonadicâ without changing their structure. For instance, we can use return instead of pure (since they are identical). We can also use >> instead of *> to perform monadic actions while discarding a result. Our value parser for numbers changes a bit, but it gets simpler! The authors of Attoparsec provide a convenient parser for reading scientific numbers:
numberParser :: Parser ValuenumberParser = ValueNumber <$> scientific
Then for string values, weâll use the takeTill combinator to read all the characters until a vertical bar or newline. Then weâll apply a few text functions to remove the whitespace and get it back to a String. (The Parser monad weâre using parses things as Text rather than String).
stringParser :: Parser ValuestringParser = (ValueString . unpack . strip) <$> takeTill (\c -> c == '|' || c == '\n')
Parsing Examples
As we parse the example table, weâll switch to a more monadic approach by using do-syntax. First, we establish a cellParser that will read a value within a cell.
cellParser = do skipWhile nonNewlineSpace val <- valueParser skipWhile (not . barOrNewline) char '|' return val
Each line in our statement refers to a step of the parsing process. So first we skip all the leading whitespace. Then we parse our value. Then we skip the remaining space, and parse the final vertical bar to end the cell. Then weâll return the value we parsed.
Itâs a lot easier to keep track of whatâs going on here compared to applicative syntax. Itâs not hard to see which parts of the input we discard and which we use. If we donât assign the value with <-within do-syntax, we discard the value. If we retrieve it, weâll use it. To complete the exampleLineParser, we parse the initial bar, get many values, close out the line, and then return them:
exampleLineParser :: Parser [Value]exampleLineParser = do char '|' cells <- many cellParser char '\n' return cells where cellParser = ...
Reading the keys for the table is almost identical. All that changes is that our cellParser uses many letter instead of valueParser. So now we can put these pieces together for our exampleTableParser:
exampleTableParser :: Parser ExampleTableexampleTableParser = do string "Examples:" consumeLine keys <- exampleColumnTitleLineParser valueLists <- many exampleLineParser return $ ExampleTable keys (map (zip keys) valueLists)
We read the signal string âExamples:â, followed by consuming the line. Then we get our keys and values, and build the table with them. Again, this is much simpler than mapping a function like buildExampleTable like in applicative syntax.
Statements
The Statement parser is another area where we can improve the clarity of our code. Once again, weâll define two helper parsers. These will fetch the portions outside brackets and then inside brackets, respectively:
nonBrackets :: Parser StringnonBrackets = many (satisfy (\c -> c /= '\n' && c /= '<'))
insideBrackets :: Parser StringinsideBrackets = do char '<' key <- many letter char '>' return key
Now when we put these together, we can more clearly see the steps of the process outlined in do-syntax. First we parse the âsignalâ word, then a space. Then we get the âpairsâ of non-bracketed and bracketed portions. Finally, weâll get one last non-bracketed part:
parseStatementLine :: Text -> Parser StatementparseStatementLine signal = do string signal char ' ' pairs <- many ((,) <$> nonBrackets <*> insideBrackets) finalString <- nonBrackets ...
Now we can define our helper function buildStatement and call it on its own line in do-syntax. Then weâll return the resulting Statement. This is much easier to read than tracking which functions we map over which sections of the parser:
parseStatementLine :: Text -> Parser StatementparseStatementLine signal = do string signal char ' ' pairs <- many ((,) <$> nonBrackets <*> insideBrackets) finalString <- nonBrackets let (fullString, keys) = buildStatement pairs finalString return $ Statement fullString keys where buildStatement :: [(String, String)] -> String -> (String, [String]) buildStatement [] last = (last, []) buildStatement ((str, key) : rest) rem = let (str', keys) = buildStatement rest rem in (str <> "<" <> key <> ">" <> str', key : keys)
Scenarios and Features
As with applicative parsing, itâs now straightforward for us to finish everything off. To parse a scenario, we read the keyword, consume the line to read the title, and read the statements and examples:
scenarioParser :: Parser ScenarioscenarioParser = do string "Scenario: " title <- consumeLine statements <- many (parseStatement <* char '\n') examples <- (exampleTableParser <|> return (ExampleTable [] [])) return $ Scenario title statements examples
Again, we provide an empty ExampleTable as an alternative if there are no examples. The parser for Background looks very similar. The only difference is we ignore the result of the line and instead use Background as the title string.
backgroundParser :: Parser ScenariobackgroundParser = do string "Background:" consumeLine statements <- many (parseStatement <* char '\n') examples <- (exampleTableParser <|> return (ExampleTable [] [])) return $ Scenario "Background" statements examples
Finally, weâll put all this together as a feature. We read the title, get the background if it exists, and read our scenarios:
featureParser :: Parser FeaturefeatureParser = do string "Feature: " title <- consumeLine maybeBackground <- optional backgroundParser scenarios <- many scenarioParser return $ Feature title maybeBackground scenarios
Feature Description
One extra feature weâll add now is that we can more easily parse the âdescriptionâ of a feature. We omitted them in applicative parsing, as itâs a real pain to implement. It becomes much simpler when using a monadic approach. The first step we have to take though is to make one parser for all the main elements of our feature. This approach looks like this:
featureParser :: Parser FeaturefeatureParser = do string "Feature: " title <- consumeLine (description, maybeBackground, scenarios) <- parseRestOfFeature return $ Feature title description maybeBackground scenarios
parseRestOfFeature :: Parser ([String], Maybe Scenario, [Scenario])parseRestOfFeature = ...
Now weâll use a recursive function that reads one line of the description at a time and adds to a growing list. The trick is that weâll use the choice combinator offered by Attoparsec.
Weâll create two parsers. The first assumes there are no further lines of description. It attempts to parse the background and scenario list. The second reads a line of description, adds it to our growing list, and recurses:
parseRestOfFeature :: Parser ([String], Maybe Scenario, [Scenario])parseRestOfFeature = parseRestOfFeatureTail [] where parseRestOfFeatureTail prevDesc = do (fullDesc, maybeBG, scenarios) <- choice [noDescriptionLine prevDesc, descriptionLine prevDesc] return (fullDesc, maybeBG, scenarios)
So weâll first try to run this noDescriptionLineParser. It will try to read the background and then the scenarios as weâve always done. If it succeeds, we know weâre done. The argument we passed is the full description:
where noDescriptionLine prevDesc = do maybeBackground <- optional backgroundParser scenarios <- some scenarioParser return (prevDesc, maybeBackground, scenarios)
Now if this parser fails, we know that it means the next line is actually part of the description. So weâll write a parser to consume a full line, and then recurse:
descriptionLine prevDesc = do nextLine <- consumeLine parseRestOfFeatureTail (prevDesc ++ [nextLine])
And now weâre done! We can parse descriptions!
Conclusion
That wraps up our exploration of Attoparsec. Come back next week where weâll finish this series off by learning about Megaparsec. Weâll find that itâs syntactically very similar to Attoparsec with a few small exceptions. Weâll see how we can use some of the added power of monadic parsing to enrich our syntax.
To learn more about cool Haskell libraries, be sure to check out our Production Checklist! Itâll tell you a little bit about libraries in all kinds of areas like databases and web APIs.
If youâve never written Haskell at all, download our Beginnerâs Checklist! Itâll give you all the resources you need to get started on your Haskell journey!
Attoparsec: The Clarity of Do-Syntax was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.