Monday, June 28, 2010

video demystified(summary on mpeg2/mpeg4/h264)

  • MPEG-2
MPEG-2 uses the YCbCr color space, supporting 4:2:0, 4:2:2 and 4:4:4 sampling. The 4:2:2 and 4:4:4 sampling options increase the chroma resolution over 4:2:0, resulting in better picture quality.

There are three types of coded pictures.
I (intra) pictures are fields or frames coded as a
stand-alone still image.
P (predicted) pictures are fields or frames coded relative to the nearest previous I or P picture, resulting in forward prediction pro-cessing. B (bidirectional) pictures are fields or frames that use the closest past and future I or P picture as a reference, resulting in bidirectional prediction

A group of pictures (GOP) is a series of one or more coded pictures intended to assist in random accessing and editing. The GOP value is configurable during the encoding process. The smaller the GOP value, the better the response to movement (since the I pictures are closer together), but the lower the compression. In the coded bitstream, a GOP must start with an I picture and may be followed by any number of I, P, or B pictures in any order. In display order, a GOP must start with an I or B picture and end with an I or P picture.

An open GOP, identified by the broken_link flag, indicates that the first B pictures (if any) immediately following the first I picture after the GOP header may not be decoded correctly (and thus not be displayed) since the reference picture used for prediction is not available due to editing.



Macroblocks
Three types of macroblocks are available in MPEG-2.
The 4:2:0 macroblock consists of four Y blocks, one Cb block, and one Cr block.
The 4:2:2 macroblock consists of four Y blocks, two Cb blocks, and two Cr blocks.
The 4:4:4 macroblock consists of four Y blocks, four Cb blocks, and four Cr blocks.

Macroblocks in P pictures are coded using the closest previous I or P picture as a reference, resulting in two possible codings:
 - intra coding no motion compensation
 - forward prediction closest previous I or P picture is the reference
Macroblocks in B pictures are coded using the closest previous and/or future I or P picture as a reference, resulting in four possible codings:
 - intra coding: no motion compensation
 - forward prediction: closest previous I or P picture is the reference
 - backward prediction: closest future I or P picture is the reference
 - bi-directional prediction: two pictures used as the reference:
 - the closest previous I or P picture and
 - the closest future I or P picture

Block size: 8x8 for MPEG2

Video Bitstream
a hierarchical structure with seven layers.
From top to bottom the layers are:
 - Video Sequence
 - Sequence Header
 - Group of Pictures (GOP)
 - Picture
 - Slice
 - Macroblock (MB)
 - Block

Sequence Header
A sequence header should occur about every one-half second.
 - Sequence_header_code
 - Horizontal_size_value
 - Vertical_size_value
 - Aspect_ratio_information
 - Frame_rate_code
 - Bit_rate_value
 - Vbv_buffer_size_value
 - Constrained_parameters_flag
 - Load_intra_quantizer_matrix
 - Intra_quantizer_matrix
 - Load_non_intra_quantizer_matrix
 - Non_intra_quantizer_matrix
 - Sequence Extension
 - Extension_start_code
 - User Data
   --- User_data_start_code
   --- User Data

Data for each group of pictures consists of a GOP header followed by picture data. A GOP header should occur about every two seconds.
Data for each picture consists of a picture header followed by slice data. If a sequence extension is present, each picture header is followed by a picture coding extension.
Data for each picture consists of a picture header followed by slice data.
Data for each slice layer consists of a slice header followed by macroblock data.
Data for each macroblock layer consists of a macroblock header followed by motion vector and block data
Data for each block layer consists of coefficient data.


The program stream, used by the DVD and SVCD standards, is designed for use in relatively error-free environments. It consists of one or more PES packets multiplexed together and coded with data that allows them to be decoded in synchronization. Program stream packets may be of variable and relatively great length.

Data for each pack consists of a pack header followed by an optional system header and one or more PES packets.
The program stream map (PSM) provides a description of the bitstreams in the program stream, and their relationship to one another. It is present as PES packet data if stream_ID = program stream map.

A transport stream combines one or more programs, with one or more independent time bases, into a single stream. Each program in a transport stream may have its own time base. The time bases of different programs within a transport stream may be different.
The transport stream consists of one or more 188-byte packets. The data for each packet is from PES packets, PSI (Program Specific Information) sections, stuffing bytes, or private data.
At the start of each packet is a Packet IDentifier (PID) that enables the decoder to determine what to do with the packet.
Data for each packet consists of a packet header followed by an optional adaptation field and/or one or more data packets.


  • MPEG-4
MPEG-4 visual is divided into two sections.
MPEG-4 Part 2 includes the original MPEG-4 video codecs discussed in this section. MPEG-4 Part 10 specifies the "advanced video codec" also known as H.264, and is discussed at the end of this chapter.
Like H.263 and MPEG-2, the MPEG-4 Part 2 video codecs are also macroblock, block and DCT-based.

Instead of the video "frames" or "pictures" used in earlier MPEG specifications, MPEG-4 uses natural and synthetic visual objects.
Instances of video objects at a given time are called visual object planes (VOPs).

MPEG-4 Part 2 supports many visual profiles and levels. Only natural visual profiles are currently of the most interest in the marketplace.

Visual layers: (from top to bottom)
VS -> VO -> VOL -> GOV -> VOP
A MPEG-4 visual scene consists of one or more video objects.
Each video object may have one or more layers to support temporal or spatial scalable coding.

Each video object can be encoded in scalable (multi-layer) or nonscalable form (single layer), depending on the application, represented by the video object layer (VOL).

Video object planes can be grouped together to form a group of video object planes.

  • H.264
Rather than a single major advancement, H.264 employs many new tools designed to improve performance. These include:
- Support for 8-, 10- and 12-bit 4:2:2 and 4:4:4 YCbCr
- Integer transform
- UVLC, CAVLC and CABAC entropy coding
- Multiple reference frames
- Intra prediction
- In-loop de-blocking filter
- SP and SI slices
- Many new error resilience tools

 H.264 supported three profiles.  Baseline profile is designed for progressive video.
- I and P slice types
- 1/4-pixel motion compensation
- UVLC and CAVLC entropy coding
- Arbitrary slice ordering
- Flexible macroblock ordering
- Redundant slices
- 4:2:0 YCbCr format
Main profile is designed for a wide range of broadcast applications. Additional tools over baseline profile include:
- Interlaced pictures
- B slice type
- CABAC entropy coding
- Weighted prediction
- 4:2:2 and 4:4:4 YCbCr, 10- and 12-bit formats
- Arbitrary slice ordering not supported
- Flexible macroblock ordering not supported
- Redundant slices not supported
Extended profile is designed for mobile and Internet streaming applications. Additional tools over baseline profile include:
- B, SP and SI slice types
- Slice data partitioning
- Weighted prediction

H.264 uses the YCbCr color space, supporting 4:2:0, 4:2:2 and 4:4:4 sampling.
With H.264, the partitioning of the 16x16 macroblocks as been extended. Such fine granularity leads to a potentially large number of motion vectors per macroblock (up to 32) and number of blocks that must be interpolated (up to 96).

H.264 adds an in-loop de-blocking filter. It removes artifacts resulting from adjacent macroblocks having different estimation types and/or different quantizer scales.

The slice has greater importance in H.264 since it is now the basic independent spatial element. This prevents an error in one slice from affecting other slices.

When motion estimation is not efficient, intra prediction can be used to eliminate spatial redundancies. This technique attempts to predict the current block based on adjacent blocks. The difference between the predicted block and the actual block is then coded. This tool is very useful in flat backgrounds where spatial redundancies often exist.

H.264 adds supports for multiple reference frames. This increases compression by improving the prediction process and increases error resilience by being able to use another reference frame in the event that one was lost.

H.264 uses a simple 4x4 integer transform. An additional 2x2 transform is applied to the four CbCr DC coefficients. Intra-16×16 macroblocks have an additional 4x4 transform performed for the sixteen Y DC coefficents.

For everything but the transform coefficients, H.264 uses a single Universal VLC (UVLC) table that uses an infinite-extend codeword set (Exponential Golomb).

For transform coefficients, which consume most of the bandwidth, H.264 uses Context Adaptive Variable Length Coding (CAVLC).  Based upon previously processed data, the best VLC table is selected.

Additional efficiency (5-10%) may be achieved by using Context Adaptive Binary Arithmetic Coding (CABAC). CABAC continually updates the statistics of incoming data and real-time adaptively adjusts the algorithm using a process called context modeling.


NAL
The NAL facilitates mapping H.264 data to a variety of transport layers including:
- RTP/IP for wired and wireless Internet services
- File formats such as MP4
- H.32X for conferencing
- MPEG-2 systems
The data is organized into NAL units, packets that contain an integer number of bytes.
The first byte of each NAL unit indicates the payload data type and the remaining bytes contain the payload data. The payload data may be interleaved with additional data to prevent a start code prefix from being accidentally generated.

Monday, June 21, 2010

mix of hash & array (perl)

  • array of hashes
@AoH = (
    {
       husband  => "barney",
       wife     => "betty",
       son      => "bamm bamm",
    },

)
push @AoH, { husband => "fred", wife => "wilma", daughter => "pebbles" };
$AoH[0]{husband} = "fred";
$AoH[1]{husband} =~ s/(\w)/\u$1/;

  • array of arrays
@AoA = (
         [ "fred", "barney" ],
         [ "george", "jane", "elroy" ],
         [ "homer", "marge", "bart" ],
);
$AoA[2][3]
$ref_to_AoA->[2][3]


  • hash of arrays
%HoA = (
  flintstones => [ "fred","barney" ],
…)
$HoA{teletubbies} = [ "tinky winky", "dipsy", "laa-laa", "po" ];
$HoA{flintstones}[0] = "Fred";


  • hash of hashes
%HoH = (
    flintstones => {
        husband   => "fred",
        pal       => "barney",
    },
...
)
$HoH{ mash } = {
    captain  => "pierce",
    major    => "burns",
    corporal => "radar",
};
$role=$HoH{$family}{$role};

Sunday, June 20, 2010

ovm phase summary

Components execute their behavior in strictly ordered, pre-defined phases.  Each phase is defined by its own method, which derived components can override to incorporate component-specific behavior. 

During simulation, the phases are executed one by one, where one phase must complete before the next phase begins.  The following briefly describe each phase:
1.1     new

Also known as the constructor, the component does basic initialization of any members not subject to configuration.
1.2     build

The component constructs its children.  It uses the get_config interface to obtain any configuration for itself, the set_config interface to set any configuration for its own children, and the factory interface for actually creating the children and other objects it might need.
1.3     connect

The component now makes connections (binds TLM ports and exports) from child-to-child or from child-to-self (i.e. to promote a child port or export up the hierarchy for external access.  Afterward, all connections are checked via resolve_bindings before entering the end_of_elaboration phase.
1.4     end_of_elaboration

At this point, the entire testbench environment has been built and connected.  No new components and connections may be created from this point forward.  Components can do final checks for proper connectivity, and it can initiate communication with other tools that require stable, quasi-static component structure..
1.5     start_of_simulation

The simulation is about to begin, and this phase can be used to perform any pre-run activity such as displaying banners, printing final testbench topology and configuration information.
1.6     run

  This is where verification takes place.  It is the only predefined, time-consuming phase.  A component’s primary function is implemented in the run task.  Other processes may be forked if desired.  When a component returns from its run task, it does not signify completion of its run phase.  Any processes that it may have forked continue to run.  The run phase terminates in one of four ways:
1.6.1     stop

  When a component’s enable_stop_interrupt bit is set and global_stop_request is called, the component’s stop task is called.  Components can implement stop to allow completion of in-progress transactions, queues, etc.  Upon return from stop() by all enabled components, a do_kill_all is issued.  If the ovm_test_done_objection is being used, this stopping procedure is deferred until all outstanding objections on ovm_test_done have been dropped.
1.6.2     objections dropped

  The ovm_test_done_objection will implicitly call global_stop_request when all objections to ending the phase are dropped.  The stop procedure described above is then allowed to proceed normally.
1.6.3     kill

  When called, all component’s run processes are killed immediately.  While kill can be called directly, it is recommended that components use the stopping mechanism, which affords a more ordered and safe shut-down.
1.6.4     timeout

If a timeout was set, then the phase ends if it expires before either of the above occur.  Without a stop, kill, or timeout, simulation can continue “forever”, or the simulator may end simulation prematurely if it determines that all processes are waiting.
1.7     extract

This phase can be used to extract simulation results from coverage collectors and scoreboards, collect status/error counts, statistics, and other information from components in bottom-up order.  Being a separate phase, extract ensures all relevant data from potentially independent sources (i.e. other components) are collected before being checked in the next phase. Following are some examples of what you can do in this phase.

■ Collect assertion-error count.

■ Extract coverage information.

■ Extract the internal signals and register values of the DUT.

■ Extract internal variable values from components.

■ Extract statistics or other information from components.
1.8     check

Having extracted vital simulation results in the previous phase, the check phase can be used to validate such data and determine the overall simulation outcome.  It too executes bottom-up.
1.9     report

Finally, the report phase is used to output results to files and/or the screen.

It is called in bottom-up order.

 

Thursday, June 10, 2010

virtual things

Virtual class
If a base class is not intended to be instantiated, it can be made abstract by specifying the class to be virtual.
An abstract class cannot be instantiated; it can only be derived.
Abstract classes can also have virtual methods.

Virtual method(function)
 Virtual methods are a basic polymorphic construct. A virtual method overrides a method in all the base classes, whereas a normal method only overrides a method in that class and its descendants.(only a virtual method can be overrided.) One way to view this is that there is only one implementation of a virtual method per class hierarchy, and it is always the one in the latest derived class. When subclasses override virtual methods, they must follow the prototype exactly.

example:
virtual class BasePacket;
virtual function integer send(bit[31:0] data);
endfunction
endclass
class EtherPacket extends BasePacket;
function integer send(bit[31:0] data);
// body of the function
...
endfunction
endclass


Virtual interface
Virtual interfaces provide a mechanism for separating abstract models and test programs from the actual signals that make up the design. A virtual interface allows the same subprogram to operate on different portions of a design and to dynamically control the set of signals associated with the subprogram. Instead of referring to the actual set of signals directly, users are able to manipulate a set of virtual signals. Changes to the underlying design do not require the code using virtual interfaces to be rewritten. By abstracting the connectivity and functionality of a set of blocks, virtual interfaces promote code reuse.
A virtual interface is a variable that represents an interface instance.

Virtual interface variables can be passed as arguments to tasks, functions, or methods. A single virtual interface variable can thus represent different interface instances at different times throughout the simulation. A virtual  interface  must  be  initialized  before  it  can  be  used;  it  has  the  value  null  before  it  is  initialized.

A virtual  interface  must  be  initialized  before  it  can  be  used;  it  has  the  value  null  before  it  is  initialized.

Once  a  virtual  interface  has  been  initialized,  all  the components  of  the  underlying  interface  instance  are directly  available  to  the  virtual  interface  via  the  dot  notation. 
Virtual interfaces can be declared as class properties, which can be initialized procedurally or by an argument to new().

example:
interface SBus; // A Simple bus interface
logic req, grant;
logic [7:0] addr, data;
endinterface

class SBusTransctor; // SBus transactor class
  virtual SBus bus; // virtual interface of type Sbus
  function new( virtual SBus s );
    bus = s;  // initialize the virtual interface
  endfunction
endclass

module devA( Sbus s ) ... endmodule   // devices that use SBus

module top;
  SBus s[1:4] (); // instantiate 4 interfaces
  devA a1( s[1] ); // instantiate 4 devices
  ...
  initial begin
    SbusTransactor t[1:4];          // create 4 bus-transactors and bind
    t[1] = new( s[1] );
    ...
  end
endmodule

In the preceding example, the transaction class SbusTransctor is a simple reusable component. It is written without any global or hierarchical references and is unaware of the particular device with which it will interact. Nevertheless, the class can interact with any number of devices (four in the example) that adhere to the interface’s protocol.

(设想一下,在SBusTransctor中如果没有virtual修饰 SBus s, s就成为一个实在的interface, 所有对s的操作都在局限在SBusTransctor中,"到此为止"。有了virtual修饰,在SBusTransctor梨花的时候,通过把virtual interface和外部实际interface连接,相应的操作就能传递到理想的real DUT上。 所以virtual interface 常用于 ovm driver 设计中。

semaphore, mailbox and event


semaphore

a semaphore is a bucket to store 1 or more keys. Any process using semaphore must procure a key before it can continue to execute.
To declare a semaphore:
semaphore smTx
Semaphore is a built-in class that provides the following methods:
Create a semaphore with a specified number of keys:
function new(int keyCount = 0 );
Obtain one or more keys from the bucket. If the specified number of keys is not available, the process blocks until the keys become available.
task get(int keyCount = 1);
Return one or more keys into the bucket. If the specified number of keys is available, the method returns a positive integer and execution continues.
task put(int keyCount = 1);
Try to obtain one or more keys without blocking. The semaphore try_get() method is used to procure a specified number of keys from a semaphore, but without blocking.
function int try_get(int keyCount = 1);

mailbox
A mailbox is a communication mechanism that allows messages to be exchanged between processes. Data can be sent to a mailbox by one process and retrieved by another.
Conceptually, mailboxes behave like real mailboxes.
  When a letter is delivered and put into the mailbox, one can retrieve the letter (and any data stored within). However, if the letter has not been delivered when one checks the mailbox, one must choose whether to wait for the letter or to retrieve the letter on a subsequent trip to the mailbox. Similarly, SystemVerilog's mailboxes provide processes to transfer and retrieve data in a controlled manner.
size: Mailboxes are created as having either a bounded or unbounded queue size.
      A bounded mailbox becomes full when it contains the bounded number of messages. A process that attempts to place a message into a full mailbox shall be suspended until enough room becomes available in the mailbox queue.
      Unbounded mailboxes never suspend a thread in a send operation.

An example of creating a mailbox is as follows:
mailbox mbxRcv;

Mailbox is a built-in class that provides the following methods:
Create a new mailbox
  function new(int bound = 0);
If the bound argument  is  0,  then  the  mailbox  is  unbounded

The number of messages in a mailbox can be obtained via the num() method.
  function int num();

The put() method places a message in a mailbox.
  task put( singular message);
The message  is any singular expression, including object handles.
If the mailbox was created with a bounded queue, the process shall be suspended until there is enough room in the queue.

The try_put() method attempts to place a message in a mailbox.
  function int try_put( singular message);
The try_put()  method stores a message in the mailbox in strict FIFO order. Meaningful only for bounded mailboxes. If the mailbox is full, the method returns 0.

The get() method retrieves a message from a mailbox.
task get( ref singular message );
  The get()  method retrieves one message from the mailbox, that is, removes one message from the mailbox queue. If the mailbox is empty, then the current process blocks until a message is placed in the mailbox.

try_get() method attempts to retrieves a message from a mailbox without blocking.
  function int try_get( ref singular message );

The peek() method copies a message from a mailbox without removing the message from the queue.
  task peek( ref singular message );
The peek()  method copies one message from the mailbox without removing the message from the mailbox queue.

The try_peek() method attempts to copy a message from a mailbox without blocking.
  function int try_peek( ref singular message );



Event
 Nonblocking event trigger are supported in systemverilog  using the ->> operator.
 The basic mechanism to wait for an event to be triggered is via the event control operator, @.
@ hierarchical_event_identifier;

SystemVerilog can distinguish the event trigger itself, which is instantaneous. The triggered property is invoked using a method-like syntax:
hierarchical_event_identifier.triggered.
The triggered event property is most useful when used in the context of a wait construct:
wait ( hierarchical_event_identifier.triggered )