I’ve been hearing some pops in a sample I put into an X3A file that I was unable to reproduce via playback by any other means.
The above WAV pattern is the sample played back after manually loading it into the element. The bottom is the same sample but was loaded into the system by first putting the sample into an X3A file and then loading that file into the Yamaha MONTAGE.
In case the error was not immediately obvious I highlighted it here:
You can’t unsee it once it has been pointed out, as it is pretty nasty, but hopefully there is a way to unhear it.
In that image, all pixels are 100% identical except for the large mistake in the center of the white square and 1 pixel along the outside of the left border of the white square, indicating that there was a tiny moment of instability just before the giant artifact got created.
To repeat, it works fine if the 16-bit .WAV file is loaded manually into the element, but when it loads the sample from inside an X3A file, I am guessing that a lot of that loading time is dedicated to converting to AWM2, and this is likely where the corruption happens. If it is not specifically the AWM2 algorithm, at least it is related (playback of AWM2 data?).
Is there a known work-around for this? Maybe a flag I can set inside the X3A file to have it not perform the compression to AWM2 during the load (the X3A file contains the original WAV data with no compression, so it is known that if this compression is taking place, it is done during the loading onto the Yamaha MONTAGE)?
Or do I have to adjust the WAV data in some way to avoid triggering this bug during compression?
Thank you,
Anne
What's interesting is the "boundary", in time, that this anomaly occurs. If t=0 is the first sample and t=n is the last sample, then the anomaly is centered exactly around t=n/2.
Not sure if this is a one-off observation or if every single time you see this, it's always centered at t=n/2.
Also not sure if different encoding schemes within X3A perform differently or not (at least ones supported by the conversion process). Really only "raw" is easy to decode (and pull out samples from the X3A). Not even JM tools has the IP for dealing with other encoding methods - so you'd almost have to be a content provider with the special tools in order to compare an original WAV vs. an X3A encoded with different methods.
BTW - when you import an X3A that isn't encoded - you should be able to make an X7U user file that would repack this as a Montage library with sample included. Backup user memory first (so you can load it back later) - then initialize user memory and import the single Performance using the problematic sample into user memory - then create a user file of this.
If you follow the same steps (initializing user memory first) - but instead load the WAV into an element (I guess you could [STORE] the Performance from the Library where the X3A was loaded into first - so you don't have to redo the entire Performance) - then go to all of the elements that have samples and load the WAV(s) into elements. This would be easier if a problematic Performance from X3A has only one element to deal with in the first place. Then save off an X7U of this.
Instead of comparing audio output - you could then compare the saved files and find the encoded differences in the sample binary data within the two X7Us. I'm not sure you're doing this already or not (or are looking at the recorded audio output).
I don't think Montage would be doing decoding "on the fly". I think the translation code would be goofing the t=n/2 area and stuffing the wrong data there so you'd see this spill out of the X7U file you saved off that started as an X3U (with in-tact sample, not manually added).
Current Yamaha Synthesizers: Montage Classic 7, Motif XF6, S90XS, MO6, EX5R
What's interesting is the "boundary", in time, that this anomaly occurs. If t=0 is the first sample and t=n is the last sample, then the anomaly is centered exactly around t=n/2.
Not sure if this is a one-off observation or if every single time you see this, it's always centered at t=n/2.
Also not sure if different encoding schemes within X3A perform differently or not (at least ones supported by the conversion process). Really only "raw" is easy to decode (and pull out samples from the X3A). Not even JM tools has the IP for dealing with other encoding methods - so you'd almost have to be a content provider with the special tools in order to compare an original WAV vs. an X3A encoded with different methods.
BTW - when you import an X3A that isn't encoded - you should be able to make an X7U user file that would repack this as a Montage library with sample included. Backup user memory first (so you can load it back later) - then initialize user memory and import the single Performance using the problematic sample into user memory - then create a user file of this.
If you follow the same steps (initializing user memory first) - but instead load the WAV into an element (I guess you could [STORE] the Performance from the Library where the X3A was loaded into first - so you don't have to redo the entire Performance) - then go to all of the elements that have samples and load the WAV(s) into elements. This would be easier if a problematic Performance from X3A has only one element to deal with in the first place. Then save off an X7U of this.
Instead of comparing audio output - you could then compare the saved files and find the encoded differences in the sample binary data within the two X7Us. I'm not sure you're doing this already or not (or are looking at the recorded audio output).
I don't think Montage would be doing decoding "on the fly". I think the translation code would be goofing the t=n/2 area and stuffing the wrong data there so you'd see this spill out of the X7U file you saved off that started as an X3U (with in-tact sample, not manually added).
It appears at roughly T/2 just because I zoomed in on it in Audacity. The sample keeps going after the image, so this pop actually happens relatively soon within the sample.
The X3A file is one that I created via my own custom tools, and it just copies the raw WAV data over into it, so I know that its contents exactly match the original sample data.
I made the above comparisons by playing back the sample on the device, one pass by manually loading the sample as RAW .WAV data and the other pass by loading the X3A file.
I confirmed that there are no settings on the device that can fix this; it happens no matter the filter or any other setting. It’s just not played back correctly when loaded from the X3A file.
So the solution I created was to disrupt the AWM2 encoding, or the playback system, whichever is causing this (if your sample has a long period of silence the sampler will stop playing it back, so it has some problematic runtime analysis).
From the original sample I generated a pair of complimentary samples:
When these 2 samples are played back, they recombine back into the original sample.
Unfortunately this happens after filtering, so what gets combined is the results of each sample filtered separately, but a quick analysis shows that this issue is not a real problem.
The top is the “complimentary samples” played back together on the device, the middle is the original sample played back on the device (inverted), and the bottom is the difference boosted by 50dB.
The difference is mainly within the “settling period” of the filter (which exists even if you set the filter to THRU) at the beginning, but it is a viable solution because:
#1: The difference here has been boosted heavily; it is quite tiny.
#2: The difference is in the shape of the original sound, so our ears still hear the original sound. This is significantly better than hearing a pop.
I am unfortunately not able to make a copy into an X7A or X7U file, as the library files must have the exact correct contents in the exact correct order or else my MIDI files cannot select the correct performance parts, which can be spread across multiple libraries. The system I have created to allow the MIDI file to select performance parts requires everything to be “just so” and the tables are generated beforehand.
The “complimentary sample” short-circuiting of this issue will also be applicable when I have samples that are lyrics with minor pauses in them, as it keeps the sampler busy so it can’t stop playback during silences, so it’s okay for now.
I really wish Yamaha would just not have these issues. Why did I have to come up with this really exotic solution to bypass “quirks” in the system (whether it is all on the sampler, or if part of it is related to AWM2, etc., doesn’t matter)? Maybe this issue presented above is an actual bug, but why does the sampler stop playback during silences in the sample? Who asked for that behavior? If I want to “waste” sample space by having short silences in my samples, how is that any business but my own (expecting a reply suggesting that Yamaha does this to “urge” me into a better practice, but since I know how the envelope state-machine works and I know exactly why the envelope state machine does this (hint: it’s another bug; it is supposed to stop the sound when it stays below a certain threshold in order for “release” to work correctly, and since there are many ways it can decay down to 0, the device generalizes its 0-checking, allowing it to check for 0’s with a more broad approach that can’t recognize the reason it is getting a series of 0’s, and this bug triggers the sample-stop, which they allow to be triggered more easily than it should be in order to save on polyphony, so it’s a bug based on design) I know that this is not actually the case anyway, but if it were, this would be my reply)?
Yes, I’m bitter. It feels nonsensical that I had to come up with this exotic solution, and if I weren’t capable, I would have just been blocked.
Thank you,
Anne
The suggestion for X7U was an experimental step in order to "pull out" what's in the Waveform buffers after conversion. The hypothesis is that the popping Performance that birthed as an X3A will spill out with different contents in the resulting X7U than the X7U that has element WAV files loaded manually.
I'm not sure much can be done about the silence detection stuff. That's an annoying "gotcha" for user generated waveforms requiring "fake silence" to trick the engine if "silence" is something you want as part of the waveform.
None of what I suggested was in an attempt to "prove" there's a problem. Just trying to better characterize it to better "silver platter" the behavior for the powers that be.
One item Yamaha may want to follow up on is the encoding stage. That the PCM stream in the X3A wasn't somehow goofed on encoding. That'd be their angle likely. If I were Yamaha, I'd probably want the source X3A and WAV file to reproduce the problem and also inspect that the X3A matches the wav. Then the rest is pretty straight forward to track down with the synth's source and debug hooks.
Current Yamaha Synthesizers: Montage Classic 7, Motif XF6, S90XS, MO6, EX5R
Even my work-around only reduced the problem, not eliminated it (the popping issue; it fully eliminates the silence cut-off). I could generate different complimentary files with more disturbance to the original and maybe get rid of it, but I don’t have the time to spend on trial-and-error like that; I am on the clock here. Part of why I am not as friendly as normal.
So I am currently working around it by letting the MIDI file select all the parts and set them up, then I drop the tempo to 10 or so so that I have time to go into the problematic samples (of which there are 3) and swap them out for the RAW file and manually re-apply the keybank settings before the actual song starts playing.
An example X3A file and the original sample can be provided; I am just waiting for Yamaha to show any interest. I had envisioned this going along the lines of, “Oh, you found a bug? Would it be possible for us to get the X3A and .WAV files to see it for ourselves?”, to which I would reply, “Yes.” And then an optional follow-up by Yamaha might be, “Thank you; here is $1,000,000.” But that last part is optional. Would be nice, but optional.
Regards,
Anne