Futuremark’s president Oliver Baltuch talks about the challenges of transparency and benchmarking.
The benchmarking business has had a bit of a credibility problem of late.
In late July, it was alleged that Antutu, a major player in the sector, had its benchmarks gamed by Samsung to make the Galaxy appear to be faster than comparable smartphones. These allegations were close to sacrilege for benchmarking enthusiasts: an impartial and unbiased test of hardware-might was favoring a particular vendor.
In the wake of this mini-scandal, many hard questions were being asked about benchmarking companies by the enthusiast community: are they cutting secret deals behind close doors to give certain vendors an edge?
One of the most established players in the benchmarking business is Futuremark, with its suite of benchmarking tools that tests nearly every computing situation and platform imaginable. VR-Zone recently sat down with Futuremark’s president Oliver Baltuch in Taipei to talk about what’s going on in the benchmarking industry and the progress of Futuremark’s iOS app.
VR-Zone:How do you make sure Futuremark doesn’t have an Antutu-gate, where you are accused of bias towards a vendor or certain piece of hardware?
We make the rules very clear upfront with the vendors.
We start our benchmark out with something called a proposal for specification. With 3D Mark Android, we put out a 25-page proposal for spec that we sent out to all the members. They then came back with information for us of how they would like that benchmark to go, we collected that into a matrix then we put out a specification.
Once we put out a specification we then start building to that specification, and if anyone wants a change to that spec they have to put in a specification change request form with what they want changed, what they want it changed to and a technical reason that it changes. That change then goes to a technical committee inside our company which, excuse the way I put it, has a ‘Chinese Wall’ built in between the money part of the company and the technical part of the company.
Our driver approval rules very carefully state that you’re not allowed identifying the benchmark itself and changing things for that. We want people running the benchmark as if its any other piece of software so that it shows how the software will be used — how any other software will operate on that system. If we find someone who is not following those rules judiciously we don’t report that to the world, we have a quiet conversation with those people, let them know that those scores will not be available publicly on our website until they change their behavior and usually what happens is usually they say they’ve made a mistake. They make the changes and we review it.
Antutu simply puts the benchmark out and people just use it, simply like a device. For us, we don’t believe a benchmark is a product until it has a full set of reviewers’ guidelines, driver rules, approval policies, and a full back end server that takes the scores and can look at all the different tests and measure the ratios between them so we know that the people who are giving us the stuff haven’t kind of monkeyed with the stuff. We have service, we have [quality assurance]. We make sure our product is a full product.
And one of the things people don’t realize is that in Finland there’s a tremendous transparency of corporations, even though we’re a private corporation in Finland all of our books and finances for the last five years can be seen for $5 Euro. None of the business transactions we do with other corporations can be hidden. There’s no such thing as secret deals or things that can be hidden.
So what we have is a very transparent system. Any changes that get made to the specifications of our benchmark are not private, any changes that gets done either way, all the participants gets to see everything. You always hear these types of rumors ‘Oh my God, this guy’s winning so he must have done something special with [the benchmarking company]’. All of that is not actually true. Although it sounds very nice and mysterious, because of the way we do things it’s not true.
VRZ: Would you say that your benchmark is more more honest than Antutu given your methodology?
I wouldn’t say that. The way I would put it is that we create a product for use by professionals the way Futuremark always has.
What Antutu does is a separate issue, and that is what they do. I can’t claim to know all about their inner workings. As Fins, we have to be modest about what we do. We know we put something together that we consider a product and we’re able to view the world in that type of light.
VRZ: Have vendors ever given Futuremark a bad time or flak when they’ve gotten a score that they don’t like?
Many times. Almost every single benchmark we put out.
VRZ: Do vendors pressure Futuremark to change the benchmark methodology so their score comes out more favorably?
I guess the best way of putting it is, the people who work in our company who write all the programs, they only look at things that you would consider to be technical arguments. Most of the real technical arguments are done prior to the benchmark’s release. We view things in a very data-driven way. There will always be a certain amount of hyperbole put forward by certain people, we take a look through that, but we mainly deal with the people in corporations that deal with facts. We deal with engineers, who are very solid people and very professional, and they deal directly with our engineers in a very engineer-engineer basis.
Of course there will always be someone who comes to me and tries to put pressure on me. When you have 10 people fighting over a benchmark, only one of them can win. Nine of them lose. All of them who lose have been told by their CEO that they are not allowed losing.
It’s the reviewer’s job to take a look at what we’ve done and declare who they think is better or who they think is worse. You may look at the list and you may want to say that Qualcomm is a very fine producers of silicon. I couldn’t possibly say that, because my job is not to judge. My job is not to judge, but be neutral — [judging is] the job of the press. We build tools to allow you to draw that judgement. The people who build soccer balls don’t care who wins the soccer match, all they care about is that their soccer ball is going to get used for the match. We just want to build something that’s fair and neutral in the middle.
If [a vendor loses] they might not feel good about it, but they in their hearts know that what we built is fair. And besides, they can all see the source code. They all know what the source code looks like. There are no surprises in it.
VRZ: Moving to the mobile sector, why are there a lack of cross-platform iOS benchmarks and what’s the release date for Futuremark’s iOS benchmarking app?
Our iOS benchmark should come out fairly soon, the issue is purely on v-sync.
As you know, the iOS has v-sync in it which means you can’t go over 60 frames per second. And, as you know, Apple builds some pretty nice silicon that we believe in the future may go over that amount. Our programs are made to judge render to target rather than render to resolution. We render to a 720p or 1080p target, and in that vein, I think even some of the 1080p stuff is starting to go over 60 frames-per-second.
So we’ve invented a new method, it’s like a render-to-null almost. Which allows people to get higher frames per second in the iOS. We’ve sent that now to the BDP members for their final quality assurance (QA), so they can see it. Once we’ve received the feedback back from that we’ll put it into the app store. The app store takes as much time as they need to do their final QA before it goes live to the app store.
I’m envisioning a couple of weeks or more to get that out. Then you’ll have a cross-platform benchmarking tool from Windows 8 and RT to Android and iOS. You’ll have four platforms and you can compare the scores directly.
VRZ: Has Apple seen the app yet?
It hasn’t been sent to Apple yet, but it’s very close. We really want it out there. If you put out a piece of consumer software [to the Apple Store] and it’s not perfect when it hits, you get a lot of one-star reviews. When it gets out there, we want to make sure we’ve put out a piece of software that’s as professional as possible.
VRZ: Thanks for your time.