December 18, 2014

Dec 18: Billing and Payments — a potted history

Historical

Richard Lovejoy

This blog post is part of the FastMail 2014 Advent Calendar.

The previous post on 17th December was about how we test our software. The following post on 19th December is about one of our biggest projects this year.

Technical level: low

Billing is not very glamorous — but it is important.

If you’ve ever run a business, you know that it doesn’t matter how great your service or product is, you have to be able to bill your customers, or eventually you’ll no longer be able to provide that service or product.

The financial side of FastMail is broadly divided into two parts: billing — which is keeping track of which services are being used and how much people have paid, and payments — which is about actually collecting money from customers.

Billing

When FastMail started, our billing system was fairly ad-hoc, and pretty much the bare minimum we needed to keep track of user balances. When a user signed up, or paid, or renewed, we would

create a record with the effect on their balance and a description of the event, and
update any attributes affected, such as the service level, or subscription expiry date

This did the job for a few years, but as we slowly grew it became apparent that it was not sufficient. Manual adjustments caused problems, and it was difficult to extract information that our accountants needed.

So, we redid things in a kind of engineery-accounting way.

The basic record keeping part of the system has these properties:

data in the billing system is never changed or deleted, only added
every event is modelled as a bunch of square pulses with a start time, end time, resource type, and height of the pulse
a pulse may have no end time for non-expiring resources — in this case the pulse becomes a step function
if the resource type is “money” then all steps (money is non-expiring) are paired in such a way that this is equivalent to double-entry bookkeeping
each event has a separate time that the event occurred
the authoratitive billing information is calculated by adding together all the pulses and steps.

On top of that record keeping are a bunch of simple views and convenience methods for commonly used queries and actions, such as resources which are always linked to the duration of the current subscription (this was the case for Extra Aliases and Extra Domains, when we offered those “a la carte”).

This allowed us to do a number of things that were previously difficult or impossible. We can:

See exactly how many resources (e.g. subscriptions) were being used for any time in the past,
Reconstruct all the transactions and purchases that led to the current state — this is important to show customers a statement of account,
Easily and accurately calculate pro-rata changes, even when prices have changed since the original purchase,
Audit our user attributes to make sure that they match the information in our billing system,
Reconcile our billing information with our payment gateway, and
Report on things that are important to our accountant, like “deferred revenue” and “unrealised FX gain / loss”

This billing system has now been in use for the past 10 years, with only minor changes needed in that time. Part of the reason for the small number of changes is that over time our pricing and product offerings have become progressively simpler.

Aside:

Some of our users don’t like this, because they like to optimise their subscription so that they are paying for exactly what they need, and no more. Most of our users, though, find the all-inclusive model to be much more appealing.

Some things that are still lacking in our billing system are automated “discount coupon” and “special prices for charities” functionality. I’d like to add these one day.

Payments

As a credit card merchant, an important thing is the chargeback rate — that is, the percentage of your payments that the cardholders dispute with their bank. If this rate gets too high, then the payment gateways deem you to be risky, and they will charge much higher fees, or cancel your account completely.

When we first started out we used integrated billing only for credit cards. This was done via a payment gateway — we chose Worldpay. In those days, the only payment gateways were banks, and Worldpay was then part of the Royal Bank of Scotland. Banks are pretty conservative by nature and we had to go to some lengths to reassure them that we were a good risk.

All other payment methods were manual — someone would send us a cheque or a PayPal payment or a telegraphic transfer, and we would try to find the FastMail user and apply the credit to that account.

We were fairly successful at keeping the chargeback rate low — we have a number of fraud checks, and we try to refund disputed payments promptly. So, we got along just fine with Worldpay.

Over time, Worldpay’s APIs improved and we were able to improve the user interface, do automated reconciliation, and even perform delayed capture.

However, when FastMail was bought by Opera in 2010, Worldpay would not let us keep our merchant account under the new structure. We would need to reapply for a new merchant account, and Worldpay’s policies had changed, so we would now need to keep a much larger deposit — so large that it wasn’t feasible.

Finding a payment gateway that had acceptable deposit terms turned out to be difficult. At the time, it seemed to be a common practice to require merchants to provide a continual deposit of between three to twelve months of revenue! The mindset among many of the payment gateways appeared to be that the risk of total bankruptcy of their merchants was so high that they could not tolerate any possible exposure to chargebacks, ever.

We did a lot of searching around and found that even as recently as 2010, it was hard to find a payment gateway that met our requirements, which were:

take payments in USD
take payments from anywhere in the world
pay into a USD bank account in Australia
support delayed capture
have acceptable deposit terms
not require us to see the most PCI-DSS sensitive data

This last requirement was important, because we were not a huge company, and the cost of maintaining (and certifying) compliance with the higher levels of PCI-DSS is significant.

But we still wanted to be able to conduct recurring billing to make it easy for users to renew their subscriptions, and also so that our customers could purchase “a la carte” additions to their account.

The solution is for us to redirect to the payment gateway (in 2010) or use javascript (now) to send card information directly from the user’s web browser to the payment gateway. In both cases, the communication is directly between the user’s web browser and the payment gateway, without passing through our servers. The payment gateway would then give us a token that we could use to process additional payments when necessary.

This means that even if our servers were compromised, the attacker would not be able to steal the card details of our users from our system. The best way to ensure confidentiality is not actually having the data in the first place!

We eventually found a payment gateway that met all of our requirements, and signed up with Global Collect.

An advantage of Global Collect was that it was able to process payments of many different methods, so in theory it was able to act as an abstraction layer, and allow us to easily take payments using almost any scheme, including local bank transfers in many different countries. In practice, there was a fair amount of work needed, and in the end the only additional payment method that we used was automatic payments via PayPal.

A substantial amount of effort was required to make things work with the Global Collect — all the integration points were slightly different from Worldpay and the failure modes were different too. There was substantially more work involved in the “non-payment” part of the integration. These include reconciliation with the payment gateway (to make sure that FastMail has credited all the payments to the right users, even if we didn’t initiate them), and dealing with payments that unexpectedly change status (from Succeeded to Failed or vice versa) some days or months after the actual payment.

This refactoring was not a lot of fun, so to try to reduce this in the future, a lot of groundwork was laid for our own payment abstraction layer.

We attempted to move all the Worldpay card data into Global Collect, so that customers would not have to re-enter their billing details. This generally worked technically but had some giant wrinkles. As a standing authorisation was created in Global Collect for each user, an “auth” of $1.00 was processed, which never appeared on customers’ card statements. For most customers this was fine. A small number of customers had their bank contact them about the $1.00 charge, which they didn’t know about, and an even smaller number of customers had their cards summarily cancelled because their bank deemed that the charges were probably fraudulent. None of this showed up in our testing with our own credit cards.

The whole point of this effort was to make things as convenient as possible for our customers, but the only people who noticed were those who were so severely inconvenienced that we wished we’d never tried. It was a good lesson for future migrations though.

The “delayed capture” feature worked very well with Global Collect. We would process an “auth” which would reserve the funds, and then a few days later we would “capture” the payment, which would actually take the funds from the user’s card.

Spammers and scammers are often trying to use our systems. Many of these are kept away by the requirement to pay, but a few determined spammers make use of stolen payment credentials to sign up accounts. Often we are alerted to abuse of our system within a few days, and if we find out before the payment is “captured”, then we can cancel the payment. In this case, the cardholder will never see a transaction on their paper statement.

In this situation we know that the cardholder’s credentials have been stolen, and are being actively used for fraud on the internet (because they have just been used to try to defraud us). However, we are repeatedly told by all the payment gateways and banks we have asked that there is NO WAY for us to tell the card issuing bank or the cardholder that those details are compromised.

Fast-forward a few more years to 2013, when FastMail was split from Opera. Unfortunately the merchant agreement was now no longer applicable, so we need to change to a different payment gateway.

When it seemed likely that we would have to switch payment gateways again, we pushed ahead with realising our payment abstraction layer. This would allow us to easily support multiple payment gateways at the same time, and direct users to the appropriate payment gateway.

Aside:

It’s possible that a service like Chargify might have been able to provide this for us, but that would have required us to refactor all our billing code as well. Also, I didn’t know about Chargify or similar services then.

By this time, there were a number of the new breed of “disruptive” payment gateways around — such as Pin Payments, Stripe, Braintree, and others. These don’t require large deposits from their merchants.

This means they are exposed to a chargeback risk, but have evaluated risk as being small enough that they don’t need every merchant to carry a deposit large enough to cover the maximum chargeback risk.

They effectively keep about a week of revenue as a deposit, by paying out funds a week after they were taken from the merchant’s customers. This is very convenient for a merchant as it means you can just dip a toe into the waters of a payment gateway, without taking a plunge.

After the difficulties encountered when importing data in to Global Collect, we decided not to use any data migration schemes this time. This meant our customers who already had a billing agreement with us had to enter their billing details again with a new payment gateway.

We selected Pin Payments to handle the bulk of our credit card payments, and have been pretty happy with them in general.

They had a “delayed capture” feature already, and when we asked if that could be automated (so that payments were automatically captured after 3 days) they were happy to add it. Unfortunately this came back to bite us — it turned out that the payment gateway had added a special case to deal with the automatic delayed capture, and this special case was not fully covered in their internal testing. When they made some internal changes later on, this caused a bug, and a bunch of payments were captured a second time, and a third time and a fourth! This was a giant headache to fix, and made us look pretty bad to the affected customers. As a result, we’ll shortly be doing the delayed capture ourselves — we don’t want to be skipping test cases in our payment gateway.

Speaking of “delayed capture”, it doesn’t work quite as cleanly with Pin Payments as it used to with Global Collect. Depending on the card and the bank, sometimes the “auth” appears as a transaction on the card, and then the “capture” appears as another transaction. For the affected customers, they will see two transactions on an online statement. After a while, the “auth” transaction will completely disappear (and it will never appear on paper statements) but there is a period when customers may be concerned. We have many reports of this occuring with Pin Payments, and the Stripe documentation also mentions that this may happen. We didn’t get any reports of this when we were using Global Collect.

In addition, when we started using Pin Payments they didn’t support American Express payments. Many of our business customers prefer to pay with American Express, so this was a challenge. We worked around this by initially advising American Express cardholders to pay via PayPal, and later by automatically sending those payments via Stripe.

A benefit of having our payment abstraction layer was that it was easy for us to gradually phase in payments via Pin. We started off with a couple of percent of payments, and slowly increased the percentage as we grew more confident that everything was going according to plan.

The abstraction layer makes it straightforward to integrate with PayPal directly, which we’ve had to do since late 2013.

It also means that using new payment gateways is pretty straightforward — an implementation class need to be written, and a reconciliation script. As an experiment, we’ve processed a small percentage of payments via Stripe, and we’re confident that we could switch if there was ever a persistent problem with Pin Payments.

And, after testing for a few months, today we’ve also added bitcoin payments (via BitPay) in our main web interface.