上期链接：用Haskell构建Parser Combinator（一）

回顾

上篇文章从Parser的类型、Parser的最简实现，一步步讲到Parser的Monad和相对完整的Parser库。
实际上，该Parser已经可以完成大部分的解析工作了。然而，当解析错误时，此Parser仅仅简单地返回Nothing。
当我们需要更为丰富的错误信息时，则需要在原来Parser的基础上，增加相应的模块。

类型修改

还记得（一）中，Parser类型是：

1	newtype Parser a = Parser (String -> Maybe (a, String))

既然需要错误信息，就不能简单地返回Nothing，而应该是一个String。因此需要把类型改成Either/Except。

1	newtype Parser a = Parser (String -> Either String (a, String))

添加错误信息

satisfy

在satisfy实现中添加错误信息：

satisfy :: (Char -> Bool) -> Parser Char
satisfy prd = Parser $ \case
    "" -> Left "End of input"
    (x:xs) -> if prd x 
                then Right (x, xs) 
                else Left $ "unexcepted: " ++ show x

测试：

λ> runParse (char 'a') "a"
Right ('a',"")
λ> runParse (char 'a') "b"
Left "unexcepted: 'b'"
λ> runParse (char 'a') ""
Left "End of input"

Monad

把这些instance改一下，就完成了程序的修改。

instance Functor Parser where
  fmap f pa = Parser $ \inp ->
    case runParse pa inp of
      Right (x, rest) -> Right (f x, rest)
      Left x -> Left x
instance Applicative Parser where
  pure x = Parser $ \inp -> Right (x, inp)
  pab <*> pa = Parser $ \inp ->
    case runParse pab inp of
      Right (fab, rest) -> runParse (fmap fab pa) rest
      Left x -> Left x
instance Monad Parser where
  pa >>= apb = Parser $ \inp ->
    case runParse pa inp of
      Right (a, rest) -> runParse (apb a) rest
      Left x -> Left x
instance Alternative Parser where
  empty = Parser $ const (Left "")
  pa <|> pb = Parser $ \inp ->
    case runParse pa inp of
      Left _ -> runParse pb inp
      x -> x

测试：

λ> runParse nature "123"
Right (123,"")
λ> runParse nature "adc"
Left "unexcepted: 'a'"

进一步

以上的“错误信息”虽聊胜于无，但是我们不应该就此满足。一份能用的错误信息，应该是具体的、有效的、能给用户以适当指示的。
因此，最好能够加上位置信息，以及能够自由地给出和修改提示。
接下来，让我们进一步修改类型。

位置信息

平时我们使用编译器、解释器，代码中存在语法错误时，编译器、解释器的Parser总能正确地指出错误所在位置，使我们不必迷失在代码的海洋中不知所措（C++ template metaprogramming除外）。

考虑到添加位置信息的需求，那就需要考虑位置信息的存在、更新方式。我们这里的做法是，把位置信息与待解析字符串捆绑在一起，形成PString，消耗字符串的时候，位置信息同步更新。
PString类型：

1 2	newtype Pos = Pos (Int, Int) newtype PString = PString (Pos, String)

错误信息

有了位置信息，接下来我们就可以修改错误信息了。原本错误信息类型是简单的String，现在使用自定义的错误信息类型：

1
2
3

data ParseError = DefaultError String
                | EndOfInput Pos
                | Chatty String Pos

DefaultError和原来的差不多，仅仅封装了String。
EndOfInput只需要装上位置信息。
Chatty同时需要位置信息和错误信息，以便于精确控制错误信息。

重造Parser类型

import Control.Monad
import Control.Monad.Except
import Control.Monad.State
import Control.Applicative
...
...
...
newtype Parser a = Parser
  { runP :: ExceptT ParseError (State PString) a }
  deriving (Functor, Applicative, Monad, MonadError ParseError)

以上便是最终的Parser类型。
用上了Monad Transformer，有以下好处：

便于结合多种Monad。
让编译器自动实现各种范畴。仅仅需要在文件头加上：{-# LANGUAGE GeneralizedNewtypeDeriving #-}

此时，由于使用了ExceptT，通过throwError和catchError，我们就可以非常方便地添加和修改错误信息。

Alternative

还记得上篇文章中提到，<|>的实现是强制回溯的。现在有了catchError，我们可以实现不回溯的<|>版本。当需要回溯时，可以使用try。

1
2
3

instance Alternative Parser where
  empty = throwError (DefaultError "")
  p1 <|> p2 = p1 `catchError` const p2

直接catch p1的失败，这时PString已经消耗，若此时再让p2去解析，回溯就不会发生。

try的实现：

try :: Parser a -> Parser a
try p = do
  PString s <- Parser $ lift get
  p `catchError` \e -> Parser $ lift (put $ PString s) >> throwError e

get把PString拿到，如果p解析失败了，就把原来的PString放回去然后抛出错误，从而实现了回溯。

Position

待解析字符串和位置信息要同时更新，因此我们要重新实现satisfy，以满足需求：

satisfy :: (Char -> Bool) -> Parser Char
satisfy prd = do
  PString s <- Parser $ lift get
  case s of
    (p, "") -> throwError $ EndOfInput p
    (p@(Pos (r, c)), x : xs) -> if prd x
                                then let newPos = case x of
                                                    '\n' -> Pos (r + 1, 1)
                                                    '\t' -> Pos (r, c + 8)
                                                    _    -> Pos (r, c + 1)
                                     in Parser $ lift (put $ PString (newPos, xs)) >> return x
                                else throwError $ Chatty ("unexcepted char: " ++ show x) p

注意到'\n', '\t'以及其他字符的区别。

runParser与错误信息输出

runParser :: Parser a -> String -> Either ParseError a
runParser p str =
  case (runState . runExceptT . runP) p (PString (Pos (1, 1), str)) of
    (Left err, _) -> Left (DefaultError $ "<interactive>:" ++ showErr err str)
    (Right r, _) -> Right r
showErr :: ParseError -> String -> String
showErr x input = case x of
    EndOfInput (Pos (r, c)) -> show r ++ ":" ++ show c
                               ++ ": error: end of input\n<EOF>\n^"
    Chatty msg (Pos (r, c)) -> show r ++ ":" ++ show c
                               ++ ": error: " ++ msg ++ "\n"
                               ++ (lines input !! (r - 1)) ++ "<EOF>\n"
                               ++ replicate (c - 1) ' ' ++ "^" 
    DefaultError s -> s

注意到runParser中，待解析字符串和位置(1, 1)结合。
showErr中，Chatty和EndOfInput错误都有对应的Pretty Print。

把这些都封装在parse函数中，转换成IO ()类型：

parse :: Show a => Parser a -> String -> IO ()
parse p str =
  case runParser p str of
    Left (DefaultError x) -> putStrLn $ "Failure: " ++ x
    Right r -> putStrLn $ "Success: " ++ show r

添加错误信息

由于ExceptT带有catchError，可以利用它来提供更丰富的错误信息。

首先实现通用的catchChattyError，它的作用是catch到ChattyError，添加上我们指定的额外错误信息。

catchChattyError :: Parser a -> String -> Parser a
catchChattyError p msg = p 
  `catchError` 
    \case
      Chatty m pos -> throwError $ Chatty (m ++ ", " ++ msg) pos
      otherError -> throwError otherError

接着恰当修改之前实现过的所有Parser，在末尾加上更有用的信息：

char :: Char -> Parser Char
char x = satisfy (==x) `catchChattyError` ("excepted char: " ++ show x)
space :: Parser Char
space = satisfy isSpace `catchChattyError` "excepted a white space character"
upper :: Parser Char
upper = satisfy isUpper `catchChattyError` "excepted an upper letter"
lower :: Parser Char
lower = satisfy isLower `catchChattyError` "excepted a lower letter"
alphaNum :: Parser Char
alphaNum = satisfy isAlphaNum `catchChattyError` "excepted a letter or digit"
letter :: Parser Char
letter = satisfy isAlpha `catchChattyError` "excepted a letter"
digit :: Parser Char
digit = satisfy isDigit `catchChattyError` "excepted a digit"
hexDigit :: Parser Char
hexDigit = satisfy isHexDigit `catchChattyError` "excepted a hexadecimal digit"
octDigit :: Parser Char
octDigit = satisfy isOctDigit `catchChattyError` "excepted a octal digit"
oneOf :: String -> Parser Char
oneOf cs = satisfy (`elem` cs) `catchChattyError` ("excepted one char of [" ++ cs ++ "]")

测试：

λ> parse (string "123") "124"
Failure: <interactive>:1:3: error: unexcepted char: '4', excepted char: '3'
124<EOF>
  ^
λ> parse (string "123") ""
Failure: <interactive>:1:1: error: end of input
<EOF>
^
λ> parse (oneOf "123") "abcde"
Failure: <interactive>:1:1: error: unexcepted char: 'a', excepted one char of [123]
abcde<EOF>
^
λ> parse (some (oneOf "123")) "123321"
Success: "123321"

达成任务。

下一步

现实例子，比如解析JSON类型。