hledger-dupes

Update

hledger-dupes is now included in hledger.

Using hledger

originally published in January 2015. You can find the complete source code in its github repository

Last year I was disciplined enough to manage my expenses using hledger, an accounting program written in Haskell. The results are scary and very useful at the same time: scary because I didn't know I spend so much in certain categories of items, useful because with this data in hand I can optimize how I use my money.

One of the benefits of using hledger is that, besides the functionality immediately available, one can develop other utilities using the library the software is based on, hledger-lib.

One problem I have is that I sometimes write account names in different ways while I record expenses.

For example, let's say I buy a bottle of wine to bring to a dinner party:

03/10
  expenses:wine  €12.00
  assets:cash

Later, I buy another bottle but while recording it among other shoppings I erroneously categorize it as food:

10/17
  expenses:food:wine  €12.00
  assets:cash

The error in this case is particularly evident exploring the account tree with hledger-web. But if you have hundreds, as it's customary using hledger, such an error can slip easily.

What I'd like to have is a tool that automatically spot duplicates in the account tree: duplicates are defined as account names having the same leaf but different prefixes. In other words, two or more leaves that are categorized differently.

The code

I didn't find much documentation for hledger-lib, but reading (or stealing) the code from the implementation of hledger was enough for my purposes.

First we need a list of every account:

accountsNames :: Journal -> [(String, AccountName)]
accountsNames j = map leafAndAccountName as
  where leafAndAccountName a = (accountLeafName a, a)
        ps = journalPostings j
        as = nub $ sort $ map paccount ps

This is a simplified version of the accounts command as it is implemented for hledger. Note I don't want only a simple list of accounts, but a list of tuples where the first element is the leaf name

a String -, the second one is the complete name for the account,

expressed as AccountName (a type defined in Hledger). This will be essential later, to decide what a duplicate is.

Following the wine example, here the list accountsNames would return:

[("wine", "expenses:wine"), ("wine", "expenses:food:wine")]

Now, we need to compute the list of duplicates. Let's first focus on the core of the function: according to me is the more interesting and elegant part of the program (no wonder is largely not by me, but adapted from code I found in a post by Neil Mitchell, "Repeated Word Detection with Haskell").

This function takes a list of tuples and returns a new list of tuples that are duplicates.

dupes' = filter ((> 1) . length)
       . groupBy ((==) `on` fst)
       . sortBy (compare `on` fst)

It is written in the so-called pointfree style, without mentioning the points (the data) is will operate on, just combining other simpler functions.

First it sorts the list of tuples according to their fst element, then it groups to tuples according to a special notion of equality ((=) `on` fst=, i.e. two accounts are considered equal if their leaf name is equal), and eventually returns the groups that have more than one elements.

This function is embedded in a more complex one, suitable to obtain the data structure we need for the final output. Now I need to specify two type constraints (Ord and Eq) for the first elements of the tuples.

dupes :: (Ord k, Eq k) => [(k, v)] -> [(k, [v])]
dupes l = zip dupLeafs dupAccountNames
  where dupLeafs = map (fst . head) d
        dupAccountNames = map (map snd) d
        d = dupes' l
        dupes' = filter ((> 1) . length)
          . groupBy ((==) `on` fst)
          . sortBy (compare `on` fst)

An example of the result:

[("wine", ["expenses:wine", "expenses:food:wine"])]

We're close. We just need a function to print useful information about the duplicates.

render :: (String, [AccountName]) -> IO ()
render (leafName, accountNameL) = printf "%s as %s\n" leafName
    (concat $ intersperse ", " accountNameL)

And the main function:

main = do
  args <- getArgs
  deffile <- defaultJournalPath
  let file = headDef deffile args
  j <- readJournalFile Nothing Nothing file >>= either error' return
  mapM_ render $ dupes $ accountsNames j

Here the final output for the test journal I wrote at the beginning.

$ ./hledger-dupes test.journal 
wine as expenses:food:wine, expenses:wine