Programmers Guide

Table of Contents

GETTING STARTED.. 4

OBJECTS. 6

xio_context. 6

xio_session. 7

xio_connection. 7

xio_server. 7

CONTROL PATH.. 9

Connect Flow.. 9

Disconnect Flow.. 11

Basic Connect Flow Summary. 12

Multi Connection Flow.. 13

DATA PATH.. 14

XIO_MSG_TYPE_ONE_WAY. 14

XIO_MSG_TYPE_REQ and XIO_MSG_TYPE_RSP: 15

XIO_MSG_TYPE_RDMA. 16

xio_msg.in and xio_msg.out. 18

Msg flush. 19

Sgl Types. 19

In User Space. 19

In Kernel 20

Serial Number. 20

Flow Control 20

Memory management. 21

Memory Region. 21

Memory Ownership. 21

xio_mempool 23

Memory Foot Print. 23

Performance Considerations. 24

Threads. 25

Debugging and Logging. 26

Set Debug Level 26

Debug Function Example. 26

Accelio Debug Method. 26

XIO_THREAD_SAFE_DEBUG.. 26

Keep Alive. 28

Reconnect. 29

Leading Connection. 30

Writing Your First Application over Accelio. 31

Examples. 31

hello_world Example. 31

Server Side. 31

Client Side. 31

Client Side: Control Path. 32

Client Side: Data Path. 32

Server Side: Control Path. 32

Server Side: Data Path. 32

hello_world_mt Example. 32

Server Side. 33

Client Side. 34

Client Side: Control Path. 35

Client Side: Data Path. 36

Server Side: Control Path. 36

Server Side: Data Path. 36

hello world iov Example. 36

Client Side: Data Path. 36

Server Side: Data Path. 37

Statistics. 37

FIO and RAIO.. 37

FIO.. 37

R-AIO.. 38

Running FIO/RAIO Example. 38

 


 

GETTING STARTED

 

Prerequisites

1.      Install the latest MLNX_OFED package

2.      Install extra development packages:

On Ubuntu:

apt-get install numactl libnuma-dev libaio-dev libevent-dev

On Redhat:

yum install numactl numactl-devel libaio-devel libevent-devel libaio

 

For User Space

# make distclean

# ./autogen.sh

# ./configure --enable-stat-counters=no --enable-extra-checks=no

# make

# make install

 

For Kernel

# make distclean

# ./autogen.sh

# ./configure --enable-stat-counters=no --enable-extra-checks=no --enable-kernel-module

# make

# make install

 


 

Flags For configure

enable-extra-checks  -    This flag is enabled by default. It induces extra checks for message validity and other checks that are done in the fast path. We recommend enabling this flag while developing (in case of an error flow, a clear error message will be received that will allow to fix the improper use of the library) and disable it in production (multiple checks done in the fast path can decrease performance).

enable-stat-counters -   This flag is enabled by default. There is an option to collect Accelio statistics (possible only when running as root). In case the application does not use it, the flag should be disabled in order to increase performance.

 

Kernel modules

To upload accelio kernel modules:

# sudo modprobe xio_core

# sudo modprobe xio_rdma

And/Or

# sudo modprobe xio_tcp

 

To remove accelio kernel modules:

# sudo modprobe –r xio_tcp

# sudo modprobe –r xio_rdma

# sudo modprobe xio_core

 


 

OBJECTS

 

xio_context

xio_context is the event dispatcher of Accelio used to create a context on each thread that handles messaging. Each time an event such as a new connection or a new message is received from a network, an appropriate callback will be invoked. User must define those callbacks using struct xio_session_ops and pass it when creating a session/server.

On the client side, connections are created per session and with the corresponding context.

On the server side, each server thread is bound with one context.

Once the context is created using the xio_context_create method, call the blocking method xio_context_run_loop to run the dispatcher. It will run until xio_context_stop_loop is called or until a timeout (if provided) is finished. xio_context_stop_loop can be called from one of the callbacks or from a different thread.

The context can be used to add or remove private file descriptors which are eventually handled by the Accelio internal dispatcher. fd can be added using xio_context_add_ev_handler and deleted using xio_context_del_ev_handler. To use Accelio with an external dispatcher, call xio_context_get_poll_fd to get a single Accelio file descriptor.

 

There are several parameters related to context that can influence the performance:

1.      cpu_hint: This parameter enables binding the context to a specific CPU. For optium performance, it is suggested that the CPU be one of the CPUs located on the same NUMA as the Mellanox HCA. This can be discovered by running accelio/src/tools/usr/ xio_if_numa_cpus. CPU hint will also be used to allocate memory from the NUMA node ID that the CPU is bound to.

2.      polling_timeout_us: Defines how much to do receive-side-polling before yielding the CPU and entering the wait/sleep mode. When the application requires latency over IOPs and willing to poll on CPU, setting polling timeout to ~15-~70 us will decrease the latency substantially, but will increase CPU cycle by consuming more power (watts).

3.      prealloc_pools: Instructs Accelio to preallocate and register internal task pool memory. From now on, no allocation or registration of tasks pool will take place. Use this when the application can have burst of multiple connections trying to connect simultaneously. This happens more frequently on the server side.

 

In order to destroy the context, xio_context_destroy method should be called. At this point, all resources associated with this context (session, connection, etc.) should already be destroyed.

 


 

xio_session

xio_session is a central concept in Accelio. A session is a semi-permanent interactive information interchange, also known as a dialogue, a conversation or a meeting, between two or more communicating devices. A session is set up or established at a certain point in time, and torn down later. Once the session is established, messages are sent to the remote peer.

Note that requests and responses are bi-directional: client/server initiates a request and server/client responds to the request.

An established session is the basic requirement to perform a connection-oriented communication. A single xio_session may be associated with multiple xio_connection. xio_sessions are created in a URL form as follows: "<scheme>://<address>:<port>/resource"

A session may represent a remote resource that the client requests from the remote server. Upon opening a session, the user provides a structure of callbacks that is triggered upon network events, such as session events, arriving messages, errors etc.

xio_session is created using method xio_session_create. Only client side can actively create the session. On the server side, the application receives a newly created session in a on_new_session callback.

 

xio_connection

xio_connection is handled on the requester side and enables a peer to send a message to a remote peer. The application typically opens one connection per session per thread. For example, a session maintained by 4 worker threads should have a connection per each thread.

xio_connection is created using the xio_connect method. Only the client side can actively create the session. On the server side, the application receives a newly created connection in an on_session_event (event type new_connection) callback.

 

xio_server

The server object represents the passive side object. The server is created by calling xio_bind.
The following are types of xio servers:

o   The acceptor: Accepts incoming connections and handles them. It may forward an incoming connection to a back-end worker thread or redirect it to another server, and it may also reject or accept the connection.

o   The worker thread: Works on the background and parallels the workload upon all cores by scaling out the application.

A server can be a mixture of the acceptor and the worker, meaning that the same server accepts the connection and processes the message requests.

The server can receive several types of uri:

o   rdma://192.168.1.2:1234” or “rdma://192.168.1.2:1234”

o   “rdma://192.168.1.2:0” or  “tcp://192.168.1.2:0”

o   “rdma://*:1234” or “tcp://*:1234”

Upon receiving an on_new_session callback, there are several possible reactions:                

1.      Rejecting the session

2.      Accepting the session.

In this case the server that has received the callback will also send responses to the client. In this case, do not pass a portal to the xio_accept method, and call it with the following parameters: xio_accept(xio_session, NULL, 0, NULL, 0)

3.      Redirecting to another server on the same host (“light redirect”)

In this case there will be a server which listens using a known port and receives session requests, then redirects them to worker servers on other threads. When the user receives an on_new_session callback, they pass the URL of the worker they want to redirect the session to (xio_accept(xio_session, &url_worker, 1, NULL, 0);). The server listening to url_worker should already be binded. It is recommended to that the worker be on a different thread (and context) than the listener.

In case of worker server, you can pass a 0 port in the uri in xio_bind. For example: “rdma://1.1.1.1:0”. In this case, the server will bind to a free port. The actual port that the server listens to is passed by reference in “src_port”. You will need to save this port and pass the updated uri in xio_accept. It is also possible to pass a number as a port in xio_bind.

In order to destroy the server, xio_unbind method should be called.


 

CONTROL PATH

User is notified about changes in xio_connection/xio_connection_state by receiving on_session_event callbacks.

User receives enum xio_session_event describing a particular event received and enum xio_status describing the reason for the event.

 

Connect Flow

1.      xio_session is created using xio_session_create method. Only the client side can actively create the session. On the server side, the application receives a newly created session in an on_new_session callback. “session user context” will be passed as cb_user_context each time the on_session_event callback is called for this session.

User context can be modified using xio_modify_session method and queried using method xio_query_session.

 

2.      xio_connection is created using xio_connect method. Only the client side can actively create the session. On the server side, the application receives a newly created connection in an on_session_event (event type new_connection) callback. At this point, the application can call xio_send methods. The messages will be kept in an internal Accelio queue and will be sent only once the connection is established. Event loop must be run by the user in order to establish the session.

“conn_user_context” will be passed as conn_user_context for every connection callback. Connection callbacks are message related callbacks, such as on_msg, on_msg_send_complete, on_msg_delivered, on_msg_error, assign_data_in_buf, etc.

Tos, user context and several additional parameters can be modified using the xio_modify_connection method and queried using the xio_query_connection method.

 

3.      The server side receives an on_new_session callback, immediately followed by an on_session_event callback with xio_session_event event of type XIO_SESSION_NEW_CONNECTION_EVENT. This connection will be known as the “leading connection”.

o   Server rejected the session.

                                                               i.      On the server side: the connection will be closed. XIO_SESSION_CONNECTION_CLOSED_EVENT and XIO_SESSION_CONNECTION_TEARDOWN_EVENT events will be received. After they are received, the user must call xio_connection_destroy and then XIO_SESSION_TEARDOWN_EVENT event will be received. The user should call xio_session_destroy.

                                                             ii.      On the client side: XIO_SESSION_CONNECTION_ESTABLISHED_EVENT event will be received. It will be followed by XIO_SESSION_CONNECTION_CLOSED_EVENT and XIO_SESSION_CONNECTION_TEARDOWN_EVENT with the reason “Session rejected”. Once they are received, the user must call xio_connection_destroy and then XIO_SESSION_TEARDOWN_EVENT event will be received. The user should call xio_session_destroy.

o   Server accepted the session: on_session_established event will be received on the client side followed by XIO_SESSION_CONNECTION_ESTABLISHED_EVENT.

 

Illustration 1: xio connection accept

 

o   Server forwarded the session to another thread(s).

                                                               i.      On forwarder thread: the leading connection will close. XIO_SESSION_CONNECTION_CLOSED_EVENT and XIO_SESSION_CONNECTION_TEARDOWN_EVENT events will be received. Once they are received, the user must call xio_connection_destroy

                                                             ii.      On the forwardee thread: XIO_SESSION_NEW_CONNECTION_EVENT will be received.

Note that there is a race condition between those threads, and Accelio does not guarantee what happens first.

                                                            iii.      On the client side: callback on_session_established will be received followed by XIO_SESSION_CONNECTION_ESTABLISHED_EVENT event.

 

 

Disconnect Flow

Both the client side and the server side can initiate a disconnection of xio_connection/xio_session by calling xio_disconnect. This results in the following events (same on both sides):

1.      On_session_event callback is called with xio_session_event event of type XIO_SESSION_CONNECTION_CLOSED_EVENT. For the initiating side, the reason for the event will be “session closed”, and for the passive side, the reason will be “session disconnected”.

 

2.      On_session_event callback is called with xio_session_event event of type XIO_SESSION_CONNECTION_TEARDOWN_EVENT. For the initiating side, the reason for the event will be “session closed”, and for the passive side, the reason will be “session disconnected”.

 

3.      The user must call xio_connection_destroy method (from the receiving thread).

 

4.      Once the connection teardown is received for all connections on this session, on_session_event callback is called with xio_session_event event of type XIO_SESSION_TEARDOWN_EVENT.

 

5.      The user must call xio_session_destroy (from the receiving thread)

Illustration 2: xio connection close

o   Both sides must run an event loop in order to receive all events. In case one of the sides fails to run the event loop, it will trigger a 5 minute timeout.

o   In case of multiple connections per session: session teardown event can be received on one of the xio_connection threads. Therefore, connection threads can be exited only once the session teardown event is received.

In case the connection was unexpectedly closed (for example if another side has crashed), instead of XIO_SESSION_CONNECTION_CLOSED_EVENT event, XIO_SESSION_CONNECTION_DISCONNECTED_EVENT event will be received. The rest of the flow remains the same.

 

Basic Connect Flow Summary

1.      Server side: call xio_bind

2.      Client side: call xio_session_create

3.      Client side: call xio_connect

4.      Server side: on_new_session_callback, followed by XIO_SESSION_NEW_CONNECTION_EVENT.

5.      Server side: call xio_accept(xio_session, NULL, 0, NULL, 0)).

6.      Client side: on_session_established, followed by XIO_SESSION_CONNECTION_ESTABLISHED_EVENT

7.      Client/Server side: send/receive msgs

8.      Client side: xio_disconnect

9.      Client/Server side: XIO_SESSION_CONNECTION_CLOSED_EVENT followed by XIO_SESSION_CONNECTION_TEARDOWN_EVENT.

10.  Client/Server side: call xio_connection_destroy

11.  Client/Server side: XIO_SESSION_TEARDOWN_EVENT

12.  Client/Server side: call xio_session_destroy

13.  Server side: xio_unbind

 

Multi Connection Flow

This flow takes advantage of multiple threads. For this flow, the client should open several connections for the same session. Each connection should be located on a different session. In this flow, the server side should have several worker portals each on another thread. By passing a list of workers to xio_accept method, each worker will work with another connection on the client side.

In case the client has the same amount of connections/threads as the number of worker portals passed in xio_accept, each thread on the client side will be sending to one thread on the server side.

Illustration 3: multi connection

 


 

DATA PATH

There are several possible modes for the message to be sent:      

 

From requester to responder:

1.      Send – the data is sent on Accelio’s buffers

2.      RDMA READ (send the keys)

 

From responder to requester:

1.      Send - the data is sent on Accelio’s buffers

2.      RDMA WRITE

3.      RDMA READ (send the keys to the requester and the requester will perform the operation)

 

Accelio allows sending several message types: XIO_MSG_TYPE_REQ, XIO_MSG_TYPE_RSP, XIO_MSG_TYPE_ONE_WAY, and XIO_MSG_TYPE_RDMA.

In Accelio terminology, client and server refer only to the connection establishment. Once the connection is established, the client can serve as the requester, the responder or as both, and the same for the server.

 

XIO_MSG_TYPE_ONE_WAY

The requester sends a message and does not expect a response from the other side. They can receive one of 2 callbacks that will indicate that the message has been delivered to the receiver:

1.      on_ow_send_complete (default option): indicates that the message has reached the Accelio level of the receiver.

2.      On_msg_delivered (can be configured with XIO_MSG_FLAG_REQUEST_READ_RECEIPT flag): indicates that the message has reached the application level of the receiver.

Once the receiver has finished processing the message, xio_release_msg must be called. If the message goes through an asynchronous processing, the method must be called from the thread on which the message was received.

for more details see accelio/tests/usr/hello_test_ow/ or accelio/tests/kernel/hello_test_ow/

 

Illustration 4: one way msg

 

XIO_MSG_TYPE_REQ and XIO_MSG_TYPE_RSP:

The requester sends a request. The sending has to be performed from the thread that runs the event loop and which will later receive the response.  By enabling flag XIO_MSG_FLAG_REQUEST_READ_RECEIPT, the requester will be able to receive an indication that the responder’s application has received the request.  Release of resources related to this transaction (such as buffers) has to happen only after the transaction is complete (meaning that the response is received from the responder). Call to xio_release_response has to be performed from the sending thread.

After receiving the request, the responder can send the response directly from the on_msg callback or pass the request to asynchronous processing. The sending of the response must be performed from the receiving thread. The responder can release resources associated with this transaction only after receiving the on_msg_send_complete callback.

For more details see accelio/examples/usr/hello_world/ or accelio/examples/kernel/hello_world/

 

MSG DISCARD

In case the connection was closed/disconnected before the responder’s application sent all responses the msgs must be discarded. In order to do it, the application needs to send the responses as usual (using method xio_send_response and rsp->request must point to the request. See section xio_msg.in and xio_msg.out for more details.). Msg discard must happen before calling xio_connection_destroy (after event XIO_SESSION_CONNECTION_TEARDOWN_EVENT is received). For each discarded msg, on_msg_error callback will be received with “discarded” status error.

 

Illustration 5: req rsp

 

XIO_MSG_TYPE_RDMA

This type of message can be useful in case of database/writing logs. In this case, the server has multiple clients and wishes to divide the memory between clients so that each client writes to/reads from a designated memory region.

Prior to sending a message of type XIO_MSG_TYPE_RDMA, there should be a step of subscribing and publis hing rkeys. Rkeys are dependent on the xio_connection on which they are sent. Extracting the raw rkey is done using the following functions:

1.      xio_lookup_rkey_by_request

2.      xio_lookup_rkey_by_response

Both methods should be used on the server side. In case you need to look up the key on the client side, you should use xio_query_connection method. Note that usually on_msg_complete is called in batches of 16 in order to improve the performance. If you want to receive the callback for every response, turn on the XIO_MSG_FLAG_IMM_SEND_COMP flag for the response.

In case you need to match IP address to xio_connection, you may use xio_query_connection method and discover the peer_addr and local_addr.

Once the keys are published, it is possible to perform either RDMA read or RDMA write. Once the RDMA operation is done, on_rdma_direct_complete callback will be received.

The subscription/publication step needs to be performed once per xio_connection lifetime. Once the rkeys are published, all messages that are sent on this xio_connection can use the same rkeys.

 

For more details see accelio/tests/portable/direct_rdma_test/xio_rdma_common.c

 

Illustration 6.1: RDMA write using xio_msg_type_rdma

Illustration 6.2: RDMA read using xio_msg_type_rdma

 

 

xio_msg.in and xio_msg.out

The requester should fill the msg.out when sending a request. On the responder side, the request will be in msg.in. The responder should fill the msg.out with the response and MUST fill rsp.request to point to the received request. The requester will receive the response on the msg.in side of the same xio_msg that he has sent to the responder.

 

 

Illustration 7: msg in out

 

Msg flush

In case the connection was closed/disconnected before a response was received, the requester side will receive on_msg_error for each msg with the reason “flushed”.

 

Sgl Types

In User Space

XIO_SGL_TYPE_IOV – vector. This vector is pre-allocated by Accelio. Max nents size is XIO_IOVLEN (4). All elements in vector are located in memory continuously.

XIO_SGL_TYPE_IOV_PTR – pointers. For cases where vector is bigger than 4. For in/out iov sglist memory of size max_nents * sizeof(struct xio_iovec_ex) should be allocated by the user. All elements in pointer list should be located continuously in memory.

When the sglist is initialized, max nents is set. This number represents the maximum number of elements this sglist will ever hold. The actual number of elements=nents (for example, how many elements of the in msg should the application actually read) should be set each time the application writes/reads the msg. Nents <= max_nents.

See acelio/examples/usr/hello_world_iov.

In Kernel

XIO_SGL_TYPE_SCATTERLIST is the only data table. This is a scattered list of pointers (same as linux/scattered_list.h).

In case there is more than one element and the actual data length is smaller than the element length, the data will be packed together. For example if the user will fill:

Element 1: length = 1000

Element2: length = 500

The receiving side will receive:

Element 1: length = 1500

 

Serial Number

Each message that Accelio sends is assigned a serial number. This number is unique per each session. xio_msg.sn is assigned immediately after xio_send_msg or xio_send_request methods are returned. Once the response is received, the sn can be retrieved via rsp.request.sn.

 

Flow Control

Application-driven flow control is based on the rate in which the application releases each message. It is especially suitable for one-way messages. It also works on the initiator side – in case the initiator does not release messages, responses will no longer be received. When the client sends multiple messages to the server and it takes the server a while to process them, it can cause the server side to run out of memory. This case is more common in the one way messages. The user can configure flow control and the messages will stay in queues on the client side. Both sides need to configure the following queue parameters to be the same.

1.      XIO_OPTNAME_SND_QUEUE_DEPTH_MSGS - Maximum tx queued msgs. Default value is 1024.

2.      XIO_OPTNAME_RCV_QUEUE_DEPTH_MSGS - Maximum rx queued msgs. Default value is 1024.

3.      XIO_OPTNAME_SND_QUEUE_DEPTH_BYTES - Maximum tx queued bytes. Default value is 64M.

4.      XIO_OPTNAME_RCV_QUEUE_DEPTH_BYTES - Maximum rx queued bytes. Default value is 64M.

 


 

Memory management

 

Memory Region

Memory region (MR) must be registered in order for the RDMA device to read and write data from/to it.  Performing this registration takes some time. Therefore, performing memory registration is not recommended in the data path when fast response is required. The MR's starting address is addr and its size is length. The maximum size of the block that can be registered is limited to device_attr.max_mr_size. Every memory address in the virtual space of the calling process can be registered, including, but not limited to:

 

Memory Ownership

The user can have several memory ownership modes (see table memory management):

Please note that zero copy is possible only when working with RDMA (and not with TCP).

Sender

Memory

Memory ownership

Response

RDMA

read

Response RDMA write

Response Send

msgàin.data

Fully provided the by application (addr, len, mr)

Application/Accelio

data > 8K

zero copy to application buffer

data < 8K

zero copy into accelio buffer then
copy of data from accelio buffer to application buffer

msgàin.data

partially provided by the application (length only)

Accelio
allocated from internal pool according to length

data > 8K

zero copy into accelio buffer

data < 8K

copy into accelio buffer

msgàin.data

Not provided the by the application (Null, 0, NULL)

Application/Accelio

data > 8K


Receiver will send iovec to sender via “send”. And the sender will issue “rdma read” operation.


If assign_data_in_buff exists, it shall be called to assign buffer for response.

If assign_data_in_buf == NULL. Buffers shall be taken from Accelio internal pool.

data < 8K

zero copy into accelio buffer

 


 

Sender

Memory

Memory ownership

Request

RDMA

read

Request RDMA write

Request Send

msgàout.data

Fully provided the by the application (addr, len, mr)

Application

data > 8K

send iovec to receiver via “send” operation. The receiver will issue “rdma read” to read the data

data < 8K

zero copy

msgàout.data

Partially provided by the application (addr, len)

Accelio
allocated from internal pool according to length and do copy

Same as above

data < 8K

data is copied into accelio internal buffer

 

Receiver

Memory

Memory ownership

Request

RDMA

read

Request RDMA write

Request Send

msgàin.data

Assign in is implemented – filled in the callback by the application

Application

data > 8K

receiver will call assign_data_in_buf and will issue “rdma read” to read the data

data < 8K

Memory taken from Accelio

msg->in.data

msgàin.data

Assign in is not implemented – memory is taken from Accelio internal pool

Accelio will allocate from internal pool – zero copy into buffer

No call to assign_data_in_buf – buffers are taken from internal pool and then “rdma read” is issued

Same as above

msg->in.data

 

Receiver

Memory

Memory ownership

Response

RDMA

read

Response

RDMA write

Response Send

msgàout.data

Fully provided the by the application (addr, len, mr)

Application

data > 8K

if sender did not filled the “in” part.
“Send” op with iovec to requester. Requester will issue “rdma_read”

data > 8K

if sender  filled the “in” part

data < 8K

msgàout.data

Partially provided the by the application (addr, len)

Accelio will allocate from internal pool according to length and do copy

Same as above

Same as above

Same as above

 

Please note that in kernel there is no internal buffers, therefore user has to fully provide the memory or not provide it at all. There is no option to partially provide the memory.

xio_mempool

Xio_mempool exists in the user space only. The user can create several memory pools. For example; 16k, 64k, and 1M in size. Initially, the memory pool contains a small number of buffers, but it will grow when more buffers are needed up to a limit defined by the user. In case XIO_MEMPOOL_FLAG_USE_SMALLEST_SLAB flag is enabled and mempool with 16k buffers is empty, Accelio will take a buffer from the smallest available mempool (64k). This flag is enabled by default.

Example of a slab xio_mempool_add_slab:

void *mempool = mempool_create();

xio_mempool_add_slab(mempool, 1024, 10, 50, 20, 0);

This means that a pool containing 10 (initial buffers to allocate) buffers of size 1024 will be created. Each time the buffers run out, the slab will grow by 20 (alloc_quantum_nr) until there are 50 buffers (max buffers to allocate).

The buffers can be extracted from the mempool using the xio_mempool_alloc method and returned there using xio_mempool_free.

xio_mempool_destroy(mempool);

 

Memory Foot Print

In order to reduce Accelio’s memory foot print, the following Accelio parameters can be configured:

1.       XIO_OPTNAME_MAX_IN_IOVLEN - This flag indicates what will be the max in iovec for xio_msg. In case the in iovec size is smaller than the default, it is best to configure it in order to save memory.

2.      XIO_OPTNAME_MAX_OUT_IOVLEN - This flag indicates what will be the max out iovec for xio_msg. It is best to configure it to the out iovec size that the application uses in order to save memory.

3.       XIO_OPTNAME_MAX_INLINE_XIO_HEADER - This flag is used to set/get the max inline xio header. If the application sends a small header, this flag can be configured in order to save memory. Default value is 256.

4.      XIO_OPTNAME_MAX_INLINE_XIO_DATA - This flag is used to set/get the max inline xio data. If the application sends small data, this flag can be configured in order to save memory. Default value is 8k.

5.       XIO_OPTNAME_ENABLE_MEM_POOL - This flag is enabled by default. Accelio provides its own memory pool. In case the user knows that when sending large data (via RDMA read/write), the memory is always registered, this pool can be disabled in order to save memory. This requires the user to implement the "assign_in_buffer" and take full ownership on memory registration.  In case the user sends a message without filling “mr”, an error is expected to occur.

 

 


 

Performance Considerations

 

1.      Please tune the performance according to: http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf

 

2.      Accelio is an event driven application. Each time an event is received, interrupt is armed. Interrupts are expensive and may reduce performance substantially. In order to scale out, there must be a balance between polling and interrupts. Accelio enables pooling by setting xio_context_polling_timeout which handles receive side polling from hardware completion queues.

 

3.      xio_context_poll_wait & xio_context_poll_completions. The first goes to cq and enables the application to do polling directly instead of receiving interrupts and then polling.

The second one goes through the epoll and checks whether there are any events waiting. In case there are no events, the function will return.

They can replace run_event_loop (better performance but more CPU cycles).

 

4.      Accelio provides several memory allocation modes: huge pages are the default. In case huge pages are disabled, there will be a fallback to continuous pages. Fallback for continuous pages are regular pages.

 


 

Threads

Please note that the application should be thread-safe. Accelio is lockless. The suggested mode is to have one thread per xio_context. Same thread is responsible for sending msgs and for running the event loop. There are 2 possible modes of combining sending and receiving msgs from the single thread that runs the event loop:

1.       Sending msg from accelio callback (such as on_msg, on_session_established etc).

2.       Stop a loop, send msgs and run the loop again:

Sending thread

While true

                xio_context_run_loop

                while !queue_msgs_to_send.empty() // you will get to this line once the stopping thread will call method xio_context_stop_loop

                                msg = queue_msgs_to_send.pop()

                                xio_send_msg(msg)

               

Processing thread

do processing

queue_msgs_to_send.put(msg)

xio_context_stop_loop

 

The sending thread will receive on_msg callbacks and pass the msg to the processing thread

 

 In addition, the response to Accelio events (such as calling xio_connection_destroy, sending a response, releasing response etc) should always be called from the same thread (xio_context) on which the Accelio event was received.

 


 

Debugging and Logging

 

Set Debug Level

Accelio has several log levels:  XIO_LOG_LEVEL_FATAL, XIO_LOG_LEVEL_ERROR, XIO_LOG_LEVEL_WARN, XIO_LOG_LEVEL_INFO, XIO_LOG_LEVEL_DEBUG, XIO_LOG_LEVEL_TRACE.

Default log level is XIO_LOG_LEVEL_ERROR.

In order to set an Accelio log level, you can do one of the following:

·         Run time: set an environment variable XIO_TRACE. For example, in case you want to set the log to debug level export XIO_TRACE=4.

·         Compilation time: using xio_set_opt:

int level = 4;

xio_set_opt(NULL, XIO_OPTLEVEL_ACCELIO, XIO_OPTNAME_LOG_LEVEL, &level, sizeof(level));

 

Debug Function Example

In order to incorporate accelio’s logs into the application logs, the following must be done:

int optlen = sizeof(xio_log_fn);

const void* optval = (const void *)logs_from_xio_callback;

xio_set_opt(NULL, XIO_OPTLEVEL_ACCELIO, XIO_OPTNAME_LOG_FN, optval, optlen);

 

Accelio Debug Method

One of the most common user mistakes while developing over Accelio is incorrect filling of the xio_msg. In this case, one can call method xio_msg_dump inside Accelio. This message goes over all message fields and prints them. This can often help locating the error.

 

XIO_THREAD_SAFE_DEBUG

Accelio requires the application to be thread safe, however it does not enforce it. Compiling accelio with the flag xio_thread_safe_debug (can be configured in ./configure) allows to test the user application and see if it is indeed thread safe. The flag is disabled by default and hurts performance significantly therefore it should be used only in development stage. This feature is experimental and might not catch all bad flows or alternatively catch false positive. The information retrieved from using the flag can used only as a hint to a possible bad flow.

Output example:

[2015/11/08-14:43:23.24813] xio_context.c:819            [ERROR] - trying to lock an already locked lock for ctx 0x21fd440

[2015/11/08-14:43:23.24822] xio_context.c:804            [ERROR] -      stack trace is

[2015/11/08-14:43:23.24924] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/accelio/src/usr/.libs/libxio.so.0(+0x9a81) [0x7f8e846cfa81]

[2015/11/08-14:43:23.24934] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/accelio/src/usr/.libs/libxio.so.0(+0x2dec8) [0x7f8e846f3ec8]

[2015/11/08-14:43:23.24941] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/accelio/src/usr/.libs/libxio.so.0(+0x40725) [0x7f8e84706725]

 [2015/11/08-14:43:23.25033] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/accelio/src/usr/.libs/libxio.so.0(xio_context_run_loop+0x1c) [0x7f8e846ce95c]

[2015/11/08-14:43:23.25040] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/tests/random/src/usr/.libs/lt-xio_random_test() [0x40a131]

[2015/11/08-14:43:23.25047] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/tests/random/src/usr/.libs/lt-xio_random_test() [0x40a2d9]

[2015/11/08-14:43:23.25053] xio_context.c:807            [ERROR] - /lib64/libpthread.so.0(+0x7df3) [0x7f8e83c73df3]

[2015/11/08-14:43:23.25060] xio_context.c:807            [ERROR] - /lib64/libc.so.6(clone+0x6d) [0x7f8e837953dd]

[2015/11/08-14:43:23.25085] xio_context.c:804            [ERROR] -      stack trace is

[2015/11/08-14:43:23.25157] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/accelio/src/usr/.libs/libxio.so.0(+0x9a49) [0x7f8e846cfa49]

[2015/11/08-14:43:23.25166] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/accelio/src/usr/.libs/libxio.so.0(xio_send_request+0x4f) [0x7f8e8470371f]

[2015/11/08-14:43:23.25173] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/tests/random/src/usr/.libs/lt-xio_random_test() [0x408de6]

[2015/11/08-14:43:23.25179] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/tests/random/src/usr/.libs/lt-xio_random_test() [0x40a1cf]

[2015/11/08-14:43:23.25186] xio_context.c:807            [ERROR] - /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/tests/random/src/usr/.libs/lt-xio_random_test() [0x40a2d9]

[2015/11/08-14:43:23.25193] xio_context.c:807            [ERROR] - /lib64/libpthread.so.0(+0x7df3) [0x7f8e83c73df3]

[2015/11/08-14:43:23.25199] xio_context.c:807            [ERROR] - /lib64/libc.so.6(clone+0x6d) [0x7f8e837953dd]

 

Analyzing:

gdb /.autodirect/mtrswgwork/katyak/tmp/another_accelio_regression/accelio-regression/accelio/src/usr/.libs/libxio.so.0

(gdb) list *(+0x2dec8)

0x2dec8 is in xio_session_notify_connection_established (../common/xio_session.c:354).

349                                     session->cb_user_context);

350     #ifdef XIO_THREAD_SAFE_DEBUG

351             xio_ctx_debug_thread_lock(connection->ctx);

352     #endif

353             }

354     }

 


 

Keep Alive

Keep alive can be configured by using accelio options(XIO_OPTNAME_ENABLE_KEEPALIVE). It is enabled by default. The keep alive variables can be configured using the XIO_OPTNAME_CONFIG_KEEPALIVE option.

The application is not notified when the keep alive msgs are received by accelio. In case one of the peers did not answer the keep alive after struct xio_options_keepalive.time the application will receive the session event of type XIO_SESSION_CONNECTION_CLOSED_EVENT with reason “Timeout”.

 

You can enable/disable keep alive calling:

int opt = 0;

xio_set_opt(NULL, XIO_OPTLEVEL_ACCELIO, XIO_OPTNAME_ENABLE_KEEPALIVE, &opt, sizeof(int));

 

To disable keep alive set it on both peers.

 

To configure keep alive:

struct xio_options_keepalive ka;

xio_set_opt(NULL, XIO_OPTLEVEL_ACCELIO, XIO_OPTNAME_CONFIG_KEEPALIVE, &ka, sizeof(ka)))

struct xio_options_keepalive {

/** the number of unacknowledged probes to send before considering  the connection dead and notifying the application layer. */

    int probes;

 

    /** the heartbeat interval in seconds between two keepalive probes. */

    int time;

 

/** the interval in seconds between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime. */

    int intvl;

};

If a connection’s keep alive probes reaches its limit (the ka.probes option), it is considers a connection keep alive timeout and the application will receive a CONNECTION_ERROR event with XIO_E_TIMEOUT as the reason. After which the connection will get disconnected.

Please note that if the reconnect feature is enabled and the connection goes down, xio_disconnect will not be called even if a keep alive timeout occurs, since the connection disruption my affect the keep alive probes. 

Reconnect

The reconnect feature ensures reliability by re-establishing a failed connection which became disabled due to a link failure and resuming all transactions.

Reconnect can be configured by using accelio options(XIO_OPTNAME_ENABLE_RECONNECT). It is not enabled by default.

 

You can enable reconnect calling:

int reconnect = 1;

xio_set_opt(NULL, XIO_OPTLEVEL_ACCELIO, XIO_OPTNAME_ENABLE_RECONNECT, &reconnect, sizeof(reconnect));

To enable reconnect set it on both peers.

Please note that enabling reconnect will disable keep alive, as they their actions contradict one another in case of a connection failure.

 

When a connection is lost and reconnect is enabled, instead of disconnecting due to the link failure, accelio will try to re-establish the connection. WC error will be outputted with the message currently being sent dumped, in addition to an error print with CQ error. An XIO_SESSION_CONNECTION_RECONNECTING_EVENT will be raised to the user signaling that the reconnect process is commencing. If the reconnection was indeed successful and the connection is back online a XIO_SESSION_CONNECTION_RECONNECTED_EVENT will be raised to the user signaling that the reconnect process has ended. Afterwards messages will continue to be transmitted and received.  In case reconnect fails or a reconnect timeout occurs all remaining messages will be flushed with the user getting a XIO_SESSION_CONNECTION_DISCONNECTED_EVENT.

 

Please note that Reconnection only applies to connections that are online and doesn’t handle failures during connection establishment.

 

Please note that in order to create a setup that will support reconnection you will probably have to configure a bonded interface in active-backup mode. For example, configuring a Linux bond (LAG) interface over InfiniBand can be shown here: https://community.mellanox.com/docs/DOC-2160

Leading Connection

Immediately after receiving callback on_new_session, event XIO_SESSION_NEW_CONNECTION_EVENT will be received on the server side. In case the application chose to forward this session to another portal(s), this connection will be closed. This connection is known as the leading connection and events XIO_SESSION_CONNECTION_CLOSED_EVENT and XIO_SESSION_CONNECTION_TEARDOWN_EVENT will be received on the forwarding portal. Event XIO_SESSION_NEW_CONNECTION_EVENT will be received on the portal that the session will be forwarded to.

In case the user wants to differentiate between a leading connection and a regular connection he can use method xio_connection_ioctl:

int leading, optlen;

if (xio_connection_ioctl(connection, XIO_CONNECTION_LEADING_CONN,

                                                &leading, &optlen) || (optlen > 4)) {

                                printf("unable to get XIO_CONNECTION_FIONWRITE");

}  else {

                if (leading)

                                printf(“this connection is leading connection”);

                else

                                printf(“this connection is not leading connection”);

}


 

Writing Your First Application over Accelio

 

When writing an application over accelio in user space, the "main" method must include a call to xio_init() before any of the accelio methods are called.

 

Examples

 

hello_world Example

 

Server Side

[katyak@r-dcs47-006 hello_world](git::for_next)$ ./xio_server 192.168.6.47 1234 rdma 0

listen to rdma://192.168.6.47:1234

new session event. session:0x192b550

session event: new connection. session:0x192b550, connection:0x192b7b0, reason: Success

message header : [4000000] - hello world header request

message data: [4000000][0][25] - hello world data request

message header : [8000000] - hello world header request

message data: [8000000][0][25] - hello world data request

session event: connection closed. session:0x192b550, connection:0x192b7b0, reason: Session disconnected

session event: connection teardown. session:0x192b550, connection:0x192b7b0, reason: Session disconnected

session event: session teardown. session:0x192b550, connection:(nil), reason: Session disconnected

exit signaled

 

Client Side

[katyak@r-dcs46-006 hello_world](git::for_next)$ ./xio_client 192.168.6.47 1234 rdma 1

session event: connection established. reason: Success

message: [4000000] - hello world header response

message: [4000000] - hello world data response

message: [8000000] - hello world header response

message: [8000000] - hello world data response

session event: connection closed. reason: Session closed

session event: connection teardown. reason: Session closed

session event: session teardown. reason: Session closed

exit signaled

good bye

 

Client Side: Control Path

Event on_session_established is not received since the application did not provide a callback for this method. Since on_session_established is a callback that its sole purpose is to notify the user and no action on its side is required, it is possible not to define it.

Session event of type connection established is received.

After receiving enough responses (DISCONNECT_NR), the client calls xio_disconnect (running with finite_run=1 option) triggering the disconnect flow. When the final event for the session is received, the user stops context’s loop and releases all resources that were allocated and main exists.

 

Client Side: Data Path

In this example, the client has a queue of xio_msgs. All xio_msgs are initialized in the same way:

Both in and out are of type XIO_SGL_TYPE_IOV and of length XIO_IOVLEN. In the example, only the first element of the vector is filled. The out msg is prepared by copying a “hello world header request” string into the header and a “hello world data request” string into the data. All messages in queue are sent in a row. Once a response is received for each of them, the request will be sent again. Every PRINT_COUNTER response that is received will be read and printed to the screen. Note that when the response is received, the response header is nullified and the number of in vector elements is set to 0.

 

Server Side: Control Path

Server receives on_new_callback and accepts the session. Afterwards, the server receives a “new connection” event and begins receiving requests from client. Once the client disconnects the connection, the connection enters the disconnect flow. Once the session teardown event is received, the client stops the event loop, frees all resources and the main thread exits.

 

Server Side: Data Path

The server receives requests and answers them. Note that request is attached to the response. Every PRINT_COUNTER requests received, the server prints the header and goes over all the elements in data vector and prints them. Note that when the request is received, the request header is nullified and the number of in vector elements is set to 0.

 

hello_world_mt Example

 

Server Side

[katyak@r-dcs47-006 hello_world_mt](git::for_next)$ ./xio_mt_server 192.168.6.47 1234 rdma 0

thread [4] - listen:rdma://192.168.6.47:1238

thread [3] - listen:rdma://192.168.6.47:1237

thread [2] - listen:rdma://192.168.6.47:1236

thread [1] - listen:rdma://192.168.6.47:1235

session event: new connection. session:0x21b69f0, connection:0x21b6c30, reason: Success

session event: connection closed. session:0x21b69f0, connection:0x21b6c30, reason: Success

session event: connection teardown. session:0x21b69f0, connection:0x21b6c30, reason: Session closed

session event: new connection. session:0x21b69f0, connection:0x7fe2dc01ab80, reason: Success

session event: new connection. session:0x21b69f0, connection:0x7fe2e401ab80, reason: Success

session event: new connection. session:0x21b69f0, connection:0x7fe2e001ab80, reason: Success

session event: new connection. session:0x21b69f0, connection:0x7fe2d401ac70, reason: Success

thread [1] tid:0x7fe2ed9ad700 - message: [3970069] - hello world header request from thread 4

thread [3] tid:0x7fe2ec9ab700 - message: [3992990] - hello world header request from thread 2

thread [2] tid:0x7fe2ed1ac700 - message: [4002699] - hello world header request from thread 1

thread [4] tid:0x7fe2ec1aa700 - message: [4034842] - hello world header request from thread 3

thread [1] tid:0x7fe2ed9ad700 - message: [7943421] - hello world header request from thread 4

thread [3] tid:0x7fe2ec9ab700 - message: [7987987] - hello world header request from thread 2

thread [2] tid:0x7fe2ed1ac700 - message: [7998794] - hello world header request from thread 1

thread [4] tid:0x7fe2ec1aa700 - message: [8070682] - hello world header request from thread 3

thread [1] tid:0x7fe2ed9ad700 - message: [11914071] - hello world header request from thread 4

session event: connection closed. session:0x21b69f0, connection:0x7fe2d401ac70, reason: Session disconnected

session event: connection teardown. session:0x21b69f0, connection:0x7fe2d401ac70, reason: Session disconnected

thread [3] tid:0x7fe2ec9ab700 - message: [11965624] - hello world header request from thread 2

session event: connection closed. session:0x21b69f0, connection:0x7fe2dc01ab80, reason: Session disconnected

session event: connection teardown. session:0x21b69f0, connection:0x7fe2dc01ab80, reason: Session disconnected

thread [2] tid:0x7fe2ed1ac700 - message: [11972650] - hello world header request from thread 1

session event: connection closed. session:0x21b69f0, connection:0x7fe2e001ab80, reason: Session disconnected

session event: connection teardown. session:0x21b69f0, connection:0x7fe2e001ab80, reason: Session disconnected

thread [4] tid:0x7fe2ec1aa700 - message: [12000000] - hello world header request from thread 3

session event: connection closed. session:0x21b69f0, connection:0x7fe2e401ab80, reason: Session disconnected

session event: connection teardown. session:0x21b69f0, connection:0x7fe2e401ab80, reason: Session disconnected

session event: session teardown. session:0x21b69f0, connection:(nil), reason: Session disconnected

exit signaled

exit signaled

exit signaled

exit signaled

exit signaled

 

Client Side

[katyak@r-dcs46-006 hello_world_mt](git::for_next)$ ./xio_mt_client 192.168.6.47 1234 rdma 1

connection established. reason: Success

connection established. reason: Success

connection established. reason: Success

connection established. reason: Success

thread [4] - tid:0x7fcb50368700  - message: [1587853] - hello world header response from thread 1

thread [2] - tid:0x7fcb5136a700  - message: [1597512] - hello world header response from thread 3

thread [1] - tid:0x7fcb51b6b700  - message: [1599762] - hello world header response from thread 2

thread [3] - tid:0x7fcb50b69700  - message: [1615309] - hello world header response from thread 4

thread [4] - tid:0x7fcb50368700  - message: [3175112] - hello world header response from thread 1

thread [2] - tid:0x7fcb5136a700  - message: [3194676] - hello world header response from thread 3

thread [1] - tid:0x7fcb51b6b700  - message: [3201995] - hello world header response from thread 2

thread [3] - tid:0x7fcb50b69700  - message: [3228346] - hello world header response from thread 4

thread [4] - tid:0x7fcb50368700  - message: [4764801] - hello world header response from thread 1

thread [2] - tid:0x7fcb5136a700  - message: [4791882] - hello world header response from thread 3

thread [1] - tid:0x7fcb51b6b700  - message: [4802186] - hello world header response from thread 2

thread [3] - tid:0x7fcb50b69700  - message: [4841512] - hello world header response from thread 4

thread [4] - tid:0x7fcb50368700  - message: [6355063] - hello world header response from thread 1

thread [2] - tid:0x7fcb5136a700  - message: [6390035] - hello world header response from thread 3

thread [1] - tid:0x7fcb51b6b700  - message: [6400952] - hello world header response from thread 2

thread [3] - tid:0x7fcb50b69700  - message: [6455153] - hello world header response from thread 4

thread [4] - tid:0x7fcb50368700  - message: [7943421] - hello world header response from thread 1

thread [2] - tid:0x7fcb5136a700  - message: [7987987] - hello world header response from thread 3

thread [1] - tid:0x7fcb51b6b700  - message: [7998794] - hello world header response from thread 2

thread [3] - tid:0x7fcb50b69700  - message: [8070682] - hello world header response from thread 4

thread [4] - tid:0x7fcb50368700  - message: [9532364] - hello world header response from thread 1

thread [2] - tid:0x7fcb5136a700  - message: [9584322] - hello world header response from thread 3

thread [1] - tid:0x7fcb51b6b700  - message: [9599745] - hello world header response from thread 2

thread [3] - tid:0x7fcb50b69700  - message: [9685069] - hello world header response from thread 4

thread [4] - tid:0x7fcb50368700  - message: [11120528] - hello world header response from thread 1

thread [2] - tid:0x7fcb5136a700  - message: [11183819] - hello world header response from thread 3

thread [1] - tid:0x7fcb51b6b700  - message: [11198544] - hello world header response from thread 2

thread [3] - tid:0x7fcb50b69700  - message: [11299399] - hello world header response from thread 4

connection closed. reason: Session closed

connection teardown. reason: Session closed

connection closed. reason: Session closed

connection teardown. reason: Session closed

connection closed. reason: Session closed

connection teardown. reason: Session closed

connection closed. reason: Session closed

connection teardown. reason: Session closed

session teardown. reason: Session closed

exit signaled

exit signaled

exit signaled

exit signaled

thread exit

thread exit

thread exit

thread exit

 

Client Side: Control Path

The session is created on the main thread. Client spawns MAX_THREADS threads, each creating a xio_context and xio_connection. Note that each thread is attached to a CPU. Each thread receives a connection established event (on_session_established event is not received since the application did not provide a callback for this method). Once each individual connection receives DISCONNECT_NR responses, the connection calls xio_disconnect, triggering disconnect flow for this connection. Once all connection teardown events is received for all connections, event session teardown will be received.  When the final event for the session is received, the user stops the context’s loop and releases all resources that have been allocated, and the program exits.

 

Client Side: Data Path

In this example, each connection initializes a single xio_msg and sends it. Once the response is received, the request will be sent again. Every PRINT_COUNTER response that is received for each thread will be read and printed to screen. Note that the sn that is printed is assigned per session. Therefore, there will be MAX_THREADS prints with sn around MAX_THREADS* PRINT_COUNTER. Note that when the response is received, the response header is nullified and the number of in vector elements is set to 0.

 

Server Side: Control Path

The server sets up a listener portal on the main thread. He spawns MAX_THREADS threads, each creating a xio_context and xio_portal listening on the same IP but a different port. Note that each thread is attached to a CPU. Each one prepares a message to be sent as responses and runs the event loop. Once the client disconnects each connection, the corresponding connection on server side enters the disconnect flow. Once the client receives a session teardown event, the event loop will be stopped, all resources freed and the main thread will exit.

 

Server Side: Data Path

Each worker portal receives requests and answers them. Before sending the response, the request must be attached. Every PRINT_COUNTER requests received, the portal prints the request. Note that there are MAX_THREADS that are receiving the requests. When the request is received, the request header is nullified and the number of in vector elements is set to 0.

 

hello world iov Example

 

Client Side: Data Path

In this example, the client has a queue of xio_msgs. All xio_msgs are initialized in the same way:

Both msg.in and msg.out are of type XIO_SGL_TYPE_IOV_PTR, max_nents is set (in this example to MAX_NENTS) and memory for the sglist is allocated. The xio_msg is prepared by copying a “hello world header request” string into the header. All elements of the out vector of all messages point to the same buffer of size MSG_DATA_LEN holding a “hello world request data" string. All elements of the in vector of all messages point to the same buffer of size MSG_DATA_LEN. (This is of course solely for the purpose of the example. Normally, each xio_msg and all its elements should point to a different buffer). The number of nents for both in and out messages is set to max_nents. All messages in queue are sent in a row. Once a response is received for each of them, the request will be sent again. Every PRINT_COUNTER response that is received will be read and printed to the screen. Note that when the response is received, the response header is nullified and the number of in vector elements is set to 0.

 

Server Side: Data Path

In this example, the server has a list of xio_msgs. All xio_msgs are initialized in the same way:

Both in and out are of type XIO_SGL_TYPE_IOV_PTR, max_nents is set (in this example to MAX_NENTS) and memory for the sglist is allocated. The xio_msg is prepared by copying a “hello world header response” string into the header. All elements of the out vector of all messages point to the same buffer of size MSG_DATA_LEN holding a “hello world response data" string. All elements of the in vector of all messages point to the same buffer of size MSG_DATA_LEN. (This is of course solely for the purpose of the example. Normally, each xio_msg and all its elements should point to a different buffer). The number of nents for both in and out messages is set to max_nents. Once a request is received, a response will be sent. Every PRINT_COUNTER request that is received will be read and printed to the screen.

 

Statistics

 

statistics.py

This script (located in accelio/management/statistics/) gives statistics for Accelio.

Usage

1.      Run ./configure with flag enable-stat-counters = yes (see GETTING STARTED) when compiling accelio

2.      Run statistics.py in a terminal connected to the machine to be analyzed.

3.      In a different terminal, run your Accelio application. Statistics for the connection would appear on the first terminal (from step 2).

4.      When finished, type exit in the first terminal (from step 2).

                                            

example output for hello world (server side)

>        PID        TID     TX_MSG     RX_MSG   TX_BYTES   RX_BYTES      DELAY   APPDELAY

      32646      32646     1.070m     1.070m    55.114M    53.073M     0.000s    23.752n

 

FIO and RAIO

 

FIO

Fio is an open source benchmark application (https://github.com/axboe/fio) allowing multi-threaded/multi-processed benchmarks of io devices.

Accelio provides a user space engine and a block device (for kernel) that is dynamically loaded by fio. This allows Accelio to be benchmarked by a standard external benchmark application.

 

R-AIO

R-AIO is Linux's Remote AIO API over libxio. R-AIO implements an interface similar to the one defined in /usr/include/libaio.h (provides asynchronous methods for read/write to/from file). The remote aio interface is located in accelio/examples/raio/usr/libraio/libraio.h. Implementation of the interface is located in raio_api.c, which is located in the same folder. For example, on how to use the r-aio api, see get_clock.c (located in accelio/examples/usr/client/).

A more complicated example on how to use the raio api is the server/client application. The client demonstrates reading of files from the server.

 There are several clients that can connect to raio server:

o   Raio client:          Located in examples/raio/usr/client/raio_client.c

 

o   fio plugin:            Works over raio client. located in examples/raio/usr/fio/

 

o   nbdx client:         Runs in kernel and serves as a block device (/dev/nbdx0). Located in examples/raio/kernel/nbdx/

 

Illustration 8: R-AIO

 

For more information on how to run nbdx, see: https://community.mellanox.com/docs/DOC-1528

 

Running FIO/RAIO Example

Clone the latest accelio and use the mater branch.

Build and install the library in /opt/xio:

cd ~/accelio/

make distclean

./autogen.sh

./configure --enable-stat-counters=no --enable-extra-checks=no --enable-fio-build --prefix=/opt/xio

make

make install

Run the raio server on one machine:

cd ~/accelio/examples/usr/raio/

./raio_server -a 2.2.102.9 -p 1234 -t rdma

The output should look like this:

Run the fio client on the other machine:

cd ~/accelio/examples/usr/fio/

./run_f_io_rd_lat.sh raio_rd_lat.fio

With the following raio_rd_lat.fio file:

                                cid:image006.jpg@01D0BE4B.19C4B1D0

The output should look like this:

1.6

 All Classes Files Functions Variables Typedefs Enumerations Enumerator Defines

Generated on 6 Apr 2016 for Accelio by  doxygen 1.6.1