A ‘quick and dirty’ test of media software – ROON – JRiver – Foobar – WMP

Jaap Veenstra

June 6, 2022

A ‘quick and dirty’ test of media software – ROON – JRiver – Foobar – WMP

Does a software package (mostly server software) influence the sound? We think it is an interesting question. Because you would think it does! But how? We start our search with a test that keeps it simple for now. We play the same file through various packages and record it through a digital loopback in Adobe Audition.

“ROON sounds like crap!” “JRiver sounds way better than Audirvana”. “Jplay is the best of them all”. You sometimes read some heavy statements on social media. And also on Alpha Audio itself. Now we certainly hear differences in software, but that is mostly firmware on a player itself. Not so much software on a media server. Although we don’t rule out the possibility of that too. After all: some software can pre-process and that can of course make a difference.

A simple chain

Let’s first explain how streaming audio over a network basically works. What does such a chain look like?

Somewhere there is a media server. That can be installed on a NAS, but also a PC, or it can even be a streaming service (in fact also a kind of media server).
The music data (the song you want to play) is requested by a player. This can be a music streamer, but also a phone, PC, etc.
This data is transmitted over a network – wired, wireless or both – from the media server to the music player
The player can be controlled by an app on a phone or tablet. But it can also be a software package on a laptop, PC, etc.

Now, you will understand that the file you are playing consists of bits and bytes: 1’s and 0’s. This file is chopped up by the server to make it possible to send this data over a network. One data package is – provided nothing has been changed in the network – about 1500 bytes. The data package also contains extra information to indicate where the package comes from, where it is supposed to go, what is included in the package and in case of TCP traffic: a CRC check to make sure the data is correct. If something is broken, the receiver knows and requests it again. Great! A network is thus pretty robust in terms of data technology. You have to do some crazy things to get data corrupted.

The simple test

Now we stay in the digital domain in this simple test. This is because, in this test, it is practically impossible to go through a streaming media player to the analog domain and then make things digital again without loss: an A/D-converter is not so stable that every conversion is exactly the same. These A/D converters do not (yet) exist. In short: for now we stay within the digital domain.

Incidentally, in this way we can exclude whether the media server / NAS makes a difference. We switch from ROON to Jriver, Foobar and Windows Media Player. We can also exclude whether the front-ends (the software on the PC) does something with the digital signal. The chain is now also kept quite simple:

We use the same source files in this test.
The test file is on a Synology NAS: a live recording of Tim Knol in 24 bit / 96 kHz
The ROON server is a Ryzen-based PC, running on Ubuntu Server with ROON server installed
We run all tests on our work / editing PC (Threadripper 3960X with 64GB RAM)
As a sound card we put in an ESI MAYA44EX. This creates a digital loop that Adobe Audtion picks up and records.
- The data stream does not ‘leave’ the sound card: Audition directly picks up the data stream.
We load the files into a multitrack to equalize them.
Each export is the difference of the master file and the recording that is 180 degrees out of phase.

The sampling

So we used a few software solutions for this “quick and dirty” test:

ROON – ASIO – FIle from ROON server (RAAT) (same folder as used by the others)
JRiver 29 – Wasapi, exclusive – File from NAS
Foobar – Wasapi, exclusive – File via SMB from NAS
Windows Media player – Wasapi, shared – File via SMB from NAS

In addition, we also had ROON reduce the sampling rate from 96 kHz to 48 kHz to see if it had any impact. (Spoiler: Yes).

Download the samples

Check all samples here. Note this is 1.9 GB!

Conclusion

We see a small amount of residue in all ‘normal’ recordings. We suspect this is a side effect of the 32bit floating point recordings. When we lay those over the 24 bit master, there is a small difference. This suspicion is strengthened when we lay the mixdowns over each other and put one out of phase: nothing remains. Or in other words, the “residue” is the same in every recording. This is too coincidental.

There is, however, quite a difference with the file when ROON samples back from 96 to 48 kHz. See the screenshot below. This shows that sample rate conversion is anything but lossless. And ROON also indicates this nicely. So pay attention to this.

What next?

What we can conclude is that there really is no difference within the digital domain. Whether you use ROON, JRiver, Foobar, WMP or any other package, they are bit perfect. If you are going to process, of course there are differences. Some do it better than others. The question is whether processing is desirable at all.

The fact that we have not observed any differences does not prevent a player from responding differently to a particular media server. It does depend on how a media server delivers the data. How constant is this? How much processing power is needed to process the data stream?

ROON, for example, uses RAAT. That’s different from a standard UPnP server. Aurender also has its own protocol. In short: there may be differences. To be continued!