H.264 Encoding Guide

Revision Log

  1. 6/11/2011: Initial draft
  2. 6/17/2011: First edit
  3. 7/19/2011: Second edit. Modified the Subpixel Motion Estimation section
  4. 7/23/2011: Added subpixel motion estimation mode 11
  5. 12/26/2011: Fixed –bitrate and –vbv-XXX settings descriptions. The bitrate is specified in kilobits per second, not bits per second.

Introduction – What the Hell are we Doing?

OK, so the big question is: “what the fuck are we trying to accomplish?” What is H.264? What is MPEG-Part 10 AVC? What is a CODEC? What is Avisynth? What is a media container?

In this tutorial, we’re going to go over how to encode a video using x264 and avisynth into an H.264 compliant bit-stream.

Background – So, What the Hell are we Really Doing?

Terminology

OK, let’s just start with some terminology so that we’re all on the right page.

Picture/Frame

A snapshot of one moment in time. A still image. We will be using the term “frame” or “video frame” from now on.

Video/Motion Picture

An ordered list of frames. When displaying these frames one after the other, you get the illusion of motion. We will be using the term “video” from now on.

Bit-stream

An ordered list of bits (one’s and zeroes).

Video bit-stream/Audio bit-stream

A bit-stream that corresponds directly to some video or audio.

Encode

(verb) To transform data from one format to another. Trivial example: 3 equals 011. I’ve encoded 3 as 011.

Decode

(verb) To transform data from its encoded format to its original format or a close approximation of its original format. Trivial example: 011 equals 3. I’ve decoded the binary representation of 3.

CODEC

An acronym for “encoder/decoder”. A CODEC is a system or framework that allows you to encode and decode a signal.

MPEG-Part 10 AVC/H.264

MPEG-Part 10 AVC/H.264 is a published standard that documents the syntax of an encoded video bit-stream. It is not a CODEC. The MPEG-Part 10 AVC/H.264 standard gives developers information on how to decode an MPEG-Part 10 AVC/H.264 compliant bit-stream. BTW, if you’re wondering why the name is so long, it’s because two separate working groups collaborated on this standard: (1) the Motion Picture Experts Group (MPEG) and (2) the International Telecommunication Union (ITU). Hence, MPEG decided to call this MPEG-Part 10 AVC and ITU decided to call this H.264. They are synonymous.

We will be using “H.264” to refer to this standard from now on.

Media File

A “media file” is simply a file that contains an encoded video bit-stream and, usually, an encoded audio bit-stream. Basically, it’s a package that has video and audio.

Now, there are several different ways to “package” our media. I’m sure you’ve probably seen several different types of media files: MKV, MP4, AVI, WMV, RM, etc. What’s the difference between these media files? The difference is: they all package the video bit-stream and the audio bit-stream differently. That’s it. That’s why we call them “media containers.”

Media Container

A media container is a file specification that documents how to organize a video bit-stream and an audio bit-stream. It’s possible that the specification has the ability to organize multiple video bit-streams and multiple audio bit-streams.

Multiplexer/Demultiplexer

A multiplexer is a piece of software that takes a video bit-stream and an audio bit-stream and organizes it according to the specifications of a particular media container.

A demultiplexer is just the opposite; it takes a media file and, following the specifications of a particular media container, extracts the video bit-stream and audio bit-stream.

Filter/Video Filter

A piece of software or code that transforms a video frame or an ordered list of video frames.

(verb) “To filter” means to apply a video filter to a video frame or an ordered list of video frames.

We will be using the terms “filter” and “video filter” interchangeably.

Macroblock

A square chunk of a video frame. These can be 16×16 pixels, 8×8 pixels, or 4×4 pixels.

GOP

Group of pictures. Basically, a group of ordered frames.

Rate-Distortion

Measure of CODEC performance (distortion at a range of code bit rates).

Peak Signal-to-Noise Ratio (PSNR)

An objective measurement between two images based on the mean squared error.

Structural Similarity (SSIM)

An objective measurement between two images based on the dynamic range of the luma.

Big Picture

So basically, we have a media file. We want to encode it—somehow. So, what we do is this:

  1. Extract the video bit-stream from the media file using a demultiplexer
  2. Take the video bit-stream and decode it to get the video frames
  3. (Optional) Filter our video frames
  4. Send these filtered video frames to x264 and encode these video frames into an H.264 compliant video bit-stream

OK, great. That’s all good, but what do we physically have to do in order to accomplish this? That’s a good question. I’m going to elaborate on each step and tell you what you have to do. But before I do any of this, I need to tell you what software tools you’ll need.

Software Tools You’ll Need

Here’s a list of software tools you’ll need (with links):

  1. Avisynth
  2. FFMPEGSource
  3. VirtualDub (optional)
  4. x264

Installing Avisynth 2.58

OK, so click the link above and download Avisynth v2.58. The developers of Avisynth haven’t come out with a new version for quite some time, so I’m pretty confident that this version will be the last one.

After you’ve downloaded the file, double-click it and install it like any other program. The directory you install Avisynth in will be called AVISYNTH_HOME from now on.

After you’ve installed Avisynth, download this file. It’s an update for Avisynth’s DirectShowSource plugin—just take my word for it. After you’ve downloaded the file, unzip it and copy the files to your AVISYNTH_HOME\plugins folder. Override any existing files with the ones in the zip file.

Installing FFMPEGSource

This is actually really easy. Just download the 7zip file, unpack it with 7-Zip, and copy the DLL, the executable, and the docs folder to your AVISYNTH_HOME\plugins folder. That’s it!

Installing x264

OK, so you actually don’t install x264. It’s a static binary that you can execute from the command line. So for now, just put it right next to the video files you want to encode.

I recommend that you place your x264 binary in some separate folder and place the path of that folder on your system path. If you don’t know what that means, just keep reading.

Installing VirtualDub (Optional)

This should be pretty simple too. You just have to download the installer, execute it, and you should be good to go.

Step 1: Avisynth

OK, now we’re ready to actually do shit. Up above, we mentioned that we need to actually extract the video bit-stream and decode it into video frames. Once we do that, we can actually manipulate the frames any way we want. Now, how do we do that?

Open up your favorite texteditor—mine is Notepad++—and begin by writing the following:

x = "EXAMPLE.mkv"
ffindex(x)
audiodub(ffvideosource(x), ffaudiosource(x))

Save your text file as “EXAMPLE.avs”—without the quotes of course.

Let’s go line by line:

  1. We set a variable x equal to the string “EXAMPLE.mkv”
  2. We pass the variable x to the function “ffindex.” This just indexes the file
  3. We call a function called “audiodub” and pass in two arguments. Those two arguments are in-lined functions which are evaluated before being passed to the function “audiodub.” I’ll explain what “audiodub” does later
    1. “ffvideosource” is FFMPEGSource’s video demultiplexer and decoder. It takes a string which represents the path to a media file, opens up the media file, demultiplexes it, decodes the video bit-stream, and returns an Avisynth data structure called a “Clip”
    2. “ffaudiosource” does the same thing as “ffvideosource,” except it decodes the audio track
    3. Each of these functions is capable of demultiplexing and decoding a specific track, but we didn’t specify a particular track in our example
  4. Lastly, we (implicitly) return a Clip back to whoever called this script. I’ll explain this in a moment

OK, so that was quite a bit of information, but if you’re familiar with any C-like programming languages, this will seem very familiar. What the hell does this do anyways? Well nothing at the moment, but you’ve just written an Avisynth script. An Avisynth script is basically a set of instructions that tells Avisynth what media file to process and how to process it. The above example simply opens up a media file called “EXAMPLE.mkv.”

Even though the above example may seem pointless, this is the first step towards encoding a media file. We’re using FFMPEGSource to take a media file and demultiplex it and decode it.

The Clip Data Structure

In Avisynth, there’s a data structure called a “Clip” that represents a media file. It has several attributes, all of which are documented on the Avisynth wiki site. The most important attribute of the Clip data structure is that a Clip can have an audio track and/or a video track—it must have at least one of those two and it cannot have more than one of each.

So real quickly, the “audiodub” function in above example takes the video stream from the first argument and the audio stream from the second argument and combines them into a single Clip.

Applying Filters

OK, so here’s where the magic really begins. In the above example, all we did was open up a media file. Here, we’re going to apply video filters to our media file.

x = "EXAMPLE.mkv"
ffindex(x)
main_video = audiodub(ffvideosource(x), ffaudiosource(x))
subbed_video = textsub(main_video, "SUBTITLE.ass")
return subbed_video

OK, so this Avisynth script has a little more shit in it, so let me explain what it does.

  1. Set x equal to the string “EXAMPLE.mkv”
  2. Call ffindex and pass the argument x
  3. Set main_video equal to the Clip that’s returned by the audiodub function
  4. Set subbed_video equal to the Clip that is returned from the function “textsub”
  5. Lastly, return (explicitly) the Clip subbed_video back to whoever called this script

You probably noticed quite a bit of changes in this example. The most important one is we explicitly returned a Clip. Who we actually return the value to is irrelevant. We don’t care who calls this Avisynth script, we only care about the fact that we returned a specific variable.

Furthermore, we added another function called “textsub” into our script. All this does is overlay subtitles onto the video frame using the subtitle file “SUBTITLE.ass.”

I wonder if I could use this to hardsub anime? *wink* *wink*

Although, I should mention that textsub is actually an external function that’s stored in the VSFilter library. You’ll have to grab a copy of the VSFilter.dll file and copy it to your AVISYNTH_HOME\plugins folder. The latest copy of VSFilter comes with the CCCP. Alternatively, you can grab an older copy online.

One thing that you should be asking yourself, especially if you’re a computer scientist, is this:

“How are the state of variables modified in Avisynth?”

Put another way, if I call the function example_function(example_clip), does anything about “example_clip” change? The answer is NO. Avisynth is, effectively, pass-by-value.

So it is actually impossible to apply transformations to Clips unless you re-assign the same variable to the returned value of a function. Observe the following:

x = ffvideosource("EXAMPLE.mkv")
x = example_filter(x)
return x

The above example shows how you should go about conserving the namespace and how you would work with Avisynth’s pass-by-value evaluation strategy.

Scratching the Surface

Now that you know how to apply filters to Clips, I should tell you that there are several other filters out there. You can find the complete list of internal filters (functions that come built-in) and an incomplete list of external filters (functions that require you to download a separate set of files) on Avisynth’s wiki page.

As an aside, you can use VirtualDub to preview and losslessly encode your Avisynth script. I won’t get into that though.

Step 2: x264

By now, you must be wondering how we actually encode stuff. All we’ve done so far is post-process our video using Avisynth, but how do we actually take that data and compress it? Excellent question.

For the sake of brevity, I’m going to assume you know how to use the system console. Furthermore, I’m going to use BASH syntax for my examples, so Windows CMD and Powershell users will just have to cope.

So fire up the system console and navigate to the folder with your Avisynth script and your video. If you didn’t put the x264 binary on your system path, then make sure that the binary is in the same folder as your video and Avisynth script.

Observe the following example:

x264 --output EXAMPLE.264 EXAMPLE.avs

The above command is the bare minimum that’s required to encode the video that’s post-processed by the Avisynth script “EXAMPLE.avs”. But what does this command actually do?

Let’s look at the first argument pair in the above example. We’ve told x264 to output a file called “EXAMPLE.264”. Now, it’s very important that you pay attention to the file-extension that we pass to x264. We’ve told x264 to output a file of type “.264,” which means that x264 is going to spit out a raw video bit-stream. The reason this is so important is because x264 has the ability to multiplex the raw video bit-stream into an MP4, MKV, or FLV media container—with no audio stream, of course.

Here are the different types of files that x264 can output:

  1. “264” – Raw video bit-stream
  2. “MP4” – MP4 media container
  3. “MKV” – MKV media container
  4. “FLV” – Flash video media container

The last argument, “EXAMPLE.avs,” is taken as the input file. x264 has the capability to take an Avisynth script, evaluate it using Avisynth (which is a library sitting somewhere in your Windows’ system folder), and get back decoded and filtered frames. Once x264 has actual video frames, the encoding process can begin.

Of course, the above command won’t give you much, since x264 is going to be using default values for everything which may or may not be what you want.

You can print out a comprehensive list of all of x264’s settings by typing the command:

x264 --fullhelp

Or going to the MeGUI wiki.

I’m not going to talk about all the possible x264 program settings, but I will cover the major ones. I’ll try to be as non-technical as possible, but that means many of my explanations will be somewhat incorrect.

If you seek further knowledge and insight on the actual functionality of x264’s program settings, I suggest you do two things:

  1. Read the textbook I’ve posted on the Reference page, H.264 and MPEG-4 Video Compression
  2. Read the “x264 Settings” page on the MeGUI wiki

Rate-Control

Rate-control refers to the bit-rate constraints that you can impose on x264. There are three major rate-control modes in x264, all of which are mutually exclusive (the presence of one implies that the presence of the other is impossible).

The other rate-control options directly affect the pictures or the macroblocks themselves. Some of them are mutually exclusive.

Constant Bit-Rate

Constant bit-rate is the simplest rate-control method to understand. You can invoke this rate-control method using the following command:

x264 --bitrate DESIRED_BIT_RATE

Replace “DESIRED_BIT_RATE” with an integer that represents how many kilobits-per-second you’d like your video to approach. The higher the bit-rate, the larger your output file will be. In general, higher bit-rate means higher quality, but it’s not that simple. As you continue reading, you’ll understand why this is true.

Using constant bit-rate affords you the ability to run several passes over your video. Here’s how it works:

  1. 1st Pass: x264 gathers information about the video. Nothing is encoded; the video is only analyzed. x264 will spit out a stats file and an MB-Tree file. Do not delete the stats file
  2. 2nd Pass: x264 uses the information in the stats file to accurately calculate the most optimal quantizer for the current frame and encodes your video

It’s actually possible to run 2 analysis passes and then encode the video in the third pass, but I think that’s overkill.

Here’s how you run x264 in two pass mode:

x264 --bitrate DESIRED_BIT_RATE --pass 1 --stats STATS_FILE_NAME --output NUL EXAMPLE.avs
x264 --bitrate DESIRED_BIT_RATE --pass 2 --stats STATS_FILE_NAME --output EXAMPLE.264 EXAMPLE.avs

You don’t actually have to declare the stats program setting; the default value for that is “x264_2pass.log”. The one thing you should really pay attention to is the “output NUL” program setting—use that verbatim if you’re using the 2-pass encoding method and running x264 on Windows. “NUL” specifies the null device file for Windows and it basically tells x264 to output nothing.

Constant Quantizer

Constant quantizer loosely refers to constant “preservation” of detail. This basically attempts to preserve the same amount of detail in every single frame, regardless of any other factors.

Constant quantizer is run in a single pass and can be set using the following command:

x264 --qp DESIRED_QUANTIZER

Lower values = higher quality; 0 = lossless

Constant Rate-Factor

Constant rate-factor (CRF) attempts to preserve the apparent detail in a given frame by analyzing the scene. This is loosely how it works:

  1. Short or high-motion scenes don’t require much preservation because you can’t even see the detail anyways. So it’s better to lower the quality (increase the quantizer) of those scenes in order to save bits
  2. Long or low-motion  scenes require a lot of preservation because the audience can see everything. Therefore, we should increase the quality (lower the quantizer) of those scenes

The CRF value is an arbitrary number that scales the quality of your video. I personally recommend this rate-control method over all the others, unless you are targeting bit-rate specific mediums (like video-conferencing).

Here’s how you use CRF:

x264 --crf DESIRED_CRF

The default value is 23.

Lower values = higher quality. 0 = lossless.

Higher resolution content can withstand higher CRF values without loss of quality.

Here are some general guidelines for CRF values:

  1. CRF 20 – Good quality. Use this for TV releases that already have some visual artifacts
  2. CRF 18 – High quality. Use this for higher quality content and pristine video sources
  3. CRF 16 – Extremely High Quality. Use this for Bluray releases and on sources that are prone to banding
  4. CRF <= 14 – Insane Quality. Use this whenever gradients or grain must be preserved. Chances are, you’ll never have to set a CRF less than 14

AQ-Strength

AQ-Strength (“Adaptive Quantization Strength”) basically tells x264 how much bias you want towards details. You specify the AQ-Strength with the following command:

x264 --aq-strength DESIRED_AQ_STRENGTH

where DESIRED_AQ_STRENGTH is a positive floating point number. It is highly advised that you use a number that is less than or equal to 2.0.

Higher numbers equate to stronger bias towards the preservation of details. That means x264 will try to not blur details on flat surfaces or textures.

If you decide to modify this number, I highly advise you increment this value 0.1 units at a time. There is a substantial increase in bit-rate for every tenth of a unit.

If you’re having issues with gradients, try cranking up AQ-Strength to something like 1.5.

Video Buffer Verifier

The video buffer verifier (VBV) system forces the H.264 encoder to consider the playback device’s memory limitations when encoding frames. It constrains the output bit-rate of the encoded file so it’s suitable for online streaming, device playback, etc. There are three different settings to consider:

  • Maximum Rate
  • Buffer Size
  • Initial Size

Maximum Rate

The VBV maximum rate dictates how quickly the video buffer will fill up (in kilobits per second). You can specify the maximum rate with the following command:

x264 --vbv-maxrate DESIRED_RATE

DESIRED_RATE has to be an integer greater than zero and specifies kilobits/second.

Buffer Size

As you can probably surmise, this is the assumed maximum size of the video buffer. You can specify this attribute with the following command:

x264 --vbv-bufsize DESIRED_SIZE

DESIRED_SIZE must be an integer greater than zero and specifies kilobits. And no, that is not typo. You really do type “bufsize”.

Initial Size

This is the portion of the buffer that must be filled before playback can begin. You can specify this attribute with the following command:

x264 --vbv-init DESIRED_SIZE

Like the other VBV settings, DESIRED_SIZE has to be an integer greater than zero and specifies kilobits.

Frame-type Options

These options affect the type of frames used by x264 for prediction, what type of frames are encoded, B-frame placement, the amount of B-frames that can be used in a given GOP, and, most importantly, the strength of the in-loop deblocking filter.

Deblock

So, before I can actually explain what this means, I need to explain a little bit about how the H.264 encoder is expected to work.

The H.264 specification allows encoders to use other frames, called “reference frames,” to predict what the current frame will be. The encoder predicts the current frame by subtracting the current from from the reference frame. Now, what’s that have to do with deblocking?

Well here’s the thing: the encoder has to use the same reference frames that the decoder is going to use. Otherwise, the encoder will be making frame-predictions based off data that the decoder won’t have. This leads to what’s called “skew.” In order to prevent this, the encoder uses decoded frames to ensure that it’s using the same reference frames as the decoder.

Basically, the encoder encodes a frame. It then decodes that frame and uses this decoded frame as a reference frame for future frames.

Did ya get all that? Yes? No? Well, I don’t care. I’m moving on.

So, how the fuck does deblocking play into this? Well, H.264 allows the encoder to use an in-loop deblocking filter. What the hell does that mean? It means this: when the encoder encodes a frame and then decodes it, it is allowed to deblock the decoded frame before storing it for future reference. That’s all that means.

Now, as awesome as all this shit is, how do we actuate this setting?

x264 --deblock ALPHA:BETA

ALPHA and BETA are floating point numbers that can be negative or positive. Let me explain what they mean.

ALPHA represents the deblocking strength. Any macroblock that’s not considered a detail has its edges smoothed so that neighboring macroblocks aren’t so noticeable. Macroblocks that are considered details undergo smoothing as well, but to a lesser degree. The higher ALPHA is, the stronger the smoothing.

BETA represents the “threshold” at which a macroblock is deemed a “detail.” Lower numbers mean that flatter macroblocks will be seen as details and won’t be smoothed as strongly. Basically, lower values means it takes less to be seen as a detail.

Default values are 0:0. It is highly recommended you stay with the range of [-3, 3] for both ALPHA and BETA.

B-Frames

Bi-directional predictive frames (B-frames) are prediction frames that use future frames and past frames to predict the current frame. And when I say “future” frames, I mean frames that are displayed later in the video sequence, not frames that haven’t been decoded yet.

You can specify the maximum amount of B-frames per GOP by using the following command:

x264 --bframes DESIRED_AMOUNT

DESIRED_AMOUNT has to be an integer greater than zero.

The reason you’d ever want to modify this value has to do with the type of video you’re working with. For anime, there are several scenes where almost nothing changes, with the exception of someone’s mouth. Because of this, the compression of anime video benefits tremendously from B-frames.

The default value for the amount of B-frames per GOP is 3. If you decide not to modify this, consider modifying the decision algorithm used to decide between placing B-frames or P-frames:

x264 --b-adapt MODE

MODE is the integer 0, 1, or 2.

Setting MODE to 0 basically tells x264 to pick B-frames over P-frames when possible.

Setting MODE to 1 is a “fast” algorithm that quickly decides whether or not to place a B-frame instead of a P-frame. Use this if you plan to use more than 3 B-frames per GOP.

Setting MODE to 2 is the “optimal” algorithm. The performance hit caused by this mode is significant and should only be used if you plan on using 3 B-frames per GOP.

Just like any other type of frame, B-frames can be used as reference frames to predict future frames. In order to set this, use the following command:

x264 --b-pyramid MODE

MODE can equal the following strings:

  • “none”: Disable this feature. Do not use B-frames as reference frames. You should do this for anime
  • “strict”: Use only one B-frame per mini-GOP. Also enforces several strict Bluray standards
  • “normal”: Use B-frames freely

Analysis

The H.264 standard is a decoding specification, so the encoding techniques are left up to the developers. Luckily, x264 has several methods to analyze video frames and gives you the ability to tweak pretty much anything.

Motion Estimation

Macroblock motion estimation is how x264 attempts to compress temporal data. It takes a given macroblock, and looks in other reference frames to see if there are any other macroblocks that “look” like it. H.264 provides the means to communicate motion estimated macroblocks, but it doesn’t care how you go about getting these estimates.

You can specify the motion estimation method using the follow command:

x264 --me METHOD

METHOD can be one of five choices below:

  1. Diamond [METHOD=”dia”]: This is the simplest and quickest method x264 has for motion estimation. x264 takes a given macroblock and checks the point immediately above, below, to the right, and to the left of the macroblock. It then takes whatever point matches the macroblock the closest. This method has a search radius of 1 because it only checks the points directly next to the current macroblock
  2. Hexagon [METHOD=”hex”]: This is the default method for motion estimation and is similar to diamond, with some improvements. This method searches in 6 directions around the selected macroblock and has a search radius of 2
  3. Uneven Multi-Hexagon [METHOD=”umh”]: A significantly slower, but highly effective alternative to the hexagon search method. The uneven multi-hexagon algorithm gives the user the ability to specify a distance (see below) and tries to avoid missing hard-to-find motion vectors
  4. Exhaustive [METHOD=”esa”]: A brute-force method to finding the best motion vector within a given search area that the user specifies
  5. Transform Exhaustive [METHOD=”tesa”]: Exactly like the exhaustive search algorithm, except it runs a Hadamard transformation on all candidate motion vectors. Not sure how this helps, but it basically adds an extra layer to the decision process of choosing a motion vector

Motion Estimation Distance

The motion estimation methods uneven multi-hexagon, exhaustive, and transform exhaustive allow the user to specify a maximum search radius. Use the following command to specify a maximum search radius:

x264 --merange SEARCH_RADIUS

SEARCH_RADIUS must be an integer greater than 0. The default search radius is 16 points.

I would advise that you keep this within “sane” limits. The interval of [16, 24] would be considered “sane.” Anything larger just unnecessarily expands the search area because motion estimation is done on the residuals of a prediction frame. The auto-correlation and correlation between residual frames drops off sharply after a few points, so you won’t be getting anything by setting the search radius equal to the size of the frame or something like that.

Subpixel Motion Estimation Complexity

The H.264 standard allows us to refine our motion estimation results by interpolating a reference frame’s subpixels—as in fractional pixels. Since there’s no way to “display” or store a fractional pixel, the H.264 encoder has to “guess” as to what those subpixels would actually look like.

So how does the encoder “guess” what these subpixels would look like? OK, so there are two different types of subpixels that the H.264 encoder can use: Half-Pixels and Quarter-Pixels (QPel).

If the encoder wants to guess what the Half-Pixels of a frame look like, it basically doubles the amount of pixels in the frame (upscales the image to twice the resolution) and interpolates what those pixels would look like using a 6-tap filter.

If the encoder wants to guess what the Quarter-Pixels of a frame look like, it doubles the amount of pixels in the frame twice. The first time it doubles the resolution it uses a 6-tap filter to get the Half-Pixels and then it doubles the resolution again using bilinear interpolation.

Once the encoder has made an educated guess as to what the subpixels would look like, it can actually use those interpolated pixels as reference points to perform motion-estimation.

To specify the subpixel motion estimation complexity, use the following command:

x264 --subme MODE

MODE can be an integer from 0 to 11. Each mode engages an additional set of video processing, in addition to the mode previous to it. Visual quality goes up as the numbers get bigger. Here’s what they mean:

  1. Full-pixel motion estimation
  2. Quarter-pixel motion estimation using sum of the absolute differences (1 iteration)
  3. Quarter-pixel motion estimation using sum of the absolute transformed differences (2 iterations)
  4. Half-pixel motion estimation followed by quarter-pixel motion estimation
  5. Use quarter-pixel motion estimation
  6. Mutli-quarter-pixel + bi-drectional motion estimation
  7. Rate-distortion on I/P frames
  8. Rate-distortion on all frames
  9. Rate-distortion refinement on I/P frames
  10. Rate-distortion refinement on all frames
  11. Quarter-pixel rate-distortion
  12. Disable all early terminations and perform full rate-distortion analysis

If you don’t know what any of that means, then just leave it be. The default value is 7.

Psychovisual Rate-Distortion Optimization

The human eye is sensitive to the composition of an image as well as the similarity. According to Dark Shikari (Jason Garnett-Glaser), the human eye would rather see a distorted, detailed image rather than a clean, blurred image. Not sure where he bases this assertion, but whatever.

The important part is, we can bias x264 to prefer details over blurred images. You can set the psychovisual rate-distortion optimization factor using the following command:

x264 --psy-rd ALPHA:BETA

ALPHA is the rate-distortion strength and must be a positive floating point number. High values means more bits are used to combat blurriness caused by dropping insignificant DCT coefficients on macroblocks. Higher does not mean better. In fact, there seems to be a “sweet-spot” for this value depending on the media you’re encoding; for anime, the sweet-spot is approximately 0.3. The default value is 1.0.

BETA is the psychovisual-trellis strength. I’m not sure what this actually does to the DCT coefficients. I assume this is actually just a coefficient that’s used somewhere when trellis is applied to the DCT coefficients. Higher values increases bit-rate, but, just like for the ALPHA setting, there seems to be a “sweet-spot” for this value. This implies that higher values do not mean higher quality. The default value is 0.0 and I would highly advise that you do not change this value for anime.

Trellis

Trellis is an algorithm that takes DCT coefficients and finds the most optimal quantization for each macroblock. By “most optimal” I mean that trellis maximizes PSNR relative to the bit-rate.

Trellis is used instead of adaptive deadzone because the correlation between DCT coefficients usually doesn’t follow a Laplacian distribution.

You can set the trellis mode using the following command:

x264 --trellis MODE

MODE can be 0, 1, or 2.

If MODE is equal to 0, then trellis is disabled entirely.

If MODE is equal to 1, then trellis is applied to the final encode on macroblocks.

If MODE is equal to 2, then trellis is applied to macroblocks on all mode-decisions.

According to Loren Meritt, one of the original developers of x264, MODE 1 increases the encoding time by 3.14% and MODE 2 increases the encoding time by 31% compared to MODE 0.

The default MODE is 1.

Metrics

Even though the human visual system is subjective and bias, it’s still important that we get a rough idea of the quality of our encode. Luckily, x264 can spit out the PSNR and SSIM of your encode.

I would like to caution that PSNR and SSIM are objective metrics for measuring the similarity between two images, but are not representative of of what a human would deem “higher quality.” For instance, most people will believe a particular video looks better if they’re in a comfortable setting.

Use the following command to have x264 measure the PSNR of an encode:

x264 --psnr

Use the following command to have x264 measure the SSIM of an encode:

x264 --ssim

Step 3: Multiplexing

Once you’ve actually encoded your video, you won’t be able to use it until you’re multiplexed it into a proper media file. In order to do this, you will need to grab the appropriate programs for the corresponding media container of your choice.

Matroska Media Container

Better known by its file-extension, “MKV,” the Matroska Media Container is the most popular media container for internet distributors and fansubbers. Their reasons for using MKV are largely validated by MKV’s support for nearly any video and audio codec, strong error-resilience, fast seeking, chapter support, ordered chapters, editions, selectable tracks, and menus.

In order to multiplex your video and audio streams into an MKV media file, you’ll need to download the following tool:

Download the release that corresponds to your system. You don’t actually install anything; you simply run the program you want. For our purposes, we’ll be running the program “mmg.exe.”

A graphical user interface will appear. It looks like this:

Now, drag and drop your .264 file, your audio file, and any other files you think are important onto the northern-most input box.

A warning will appear, telling you that your raw .264 file doesn’t have an fps element associated with it.

This means mmg can’t figure out the fps of your .264 file. You’ll have to set it manually.

Click the video track that corresponds to your .264 file:

Select the “Format Specific Options” tab:

Select the appropriate fps from the “FPS” drop-down menu:

Once you’re finished with that, click “Start Muxing”:

A progress window will pop up:

You now have an MKV file. Errors that occur during multiplexing will be displayed in the progress window. I don’t really care to explain all the things that could go wrong.


8 comments

  1. First of all, sorry for my bad English, English isn’t my native language.
    Allright so I downloaded a .exe from x264.nl named x264.exe
    I also made a map named X264 and put there in The video file that i want to convert (it’s a HD mkv file), the x264.exe and an avisynth script with some commands in it ( I have no idea if they work but I copied and pasted them). When i click on the x264.exe a command promp shows up and closes like after a sec. I really don’t know what to do. I have read like 10 guides how to encode witih x264 and your guide seems the easiest to understand, but I still don’t know how to get the command to work. I hope you can help me :)

  2. Howdy! I know this is kinda off topic but I’d figured I’d
    ask. Would you be interested in exchanging links or maybe guest
    authoring a blog post or vice-versa? My website
    addresses a lot of the same topics as yours and I feel we could greatly benefit from each other.
    If you are interested feel free to shoot me an email.
    I look forward to hearing from you! Wonderful
    blog by the way!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s