How to fix OOM error encountered when performing logical replication through jdbc? - java

I am developing a java application that uses test_decoding + jdbc driver to do PG logical replication. The main code comes from the PG documentation.
PGReplicationStream stream =
replConnection.getReplicationAPI()
.replicationStream()
.logical()
.withSlotName("demo_logical_slot")
.withSlotOption("include-xids", true)
.withSlotOption("skip-empty-xacts", true)
.withSlotOption("include-timestamp", "on")
.withStatusInterval(5, TimeUnit.SECONDS)
.start();
while (true) {
//non blocking receive message
ByteBuffer msg = stream.readPending();
if (msg == null) {
TimeUnit.MILLISECONDS.sleep(10L);
continue;
}
int offset = msg.arrayOffset();
byte[] source = msg.array();
int length = source.length - offset;
System.out.println(new String(source, offset, length));
//feedback
stream.setAppliedLSN(stream.getLastReceiveLSN());
stream.setFlushedLSN(stream.getLastReceiveLSN());
}
The above code works fine for most scenarios.
However, stream.readPending() will throw OOM error when a large transaction commit to the postgresql.
In my case, I update 432629 rows in one transaction, and the java server(2 CPU + 4GB memory) throw OOM error.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.postgresql.core.PGStream.receive(PGStream.java:445)
at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1170)
at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1035)
at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:41)
at org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:155)
at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:124)
at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:78)
So my question is how to fix the OOM error except increase the java server hardware.

Related

Import data from File to Cassandra Cluster with 5 nodes causes BusyConnectionException

For my thesis, I need to upload data from a file to Cassandra Cluster. with session.execute() it too slow. So I decide to use session.executeAsyn(). but it causes BusyConnectionException.
Here is my code in Java:
final PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setMaxRequestsPerConnection(HostDistance.LOCAL, 32768)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 32768);
final Cluster cluster = Cluster.builder()
.withPoolingOptions(poolingOptions)
.addContactPoint("x.x.x.x")
.withPort(9042)
.build();
final Session session = cluster.connect();
System.out.println("session object---" + session.getState());
final String path = "&PathToFile%";
final File dir = new File(path);
session.execute("use products;");
for (final File file : dir.listFiles()) {
final BufferedReader br = new BufferedReader(new FileReader(file));
String str;
final String insert = br.readLine();
while ((str = br.readLine()) != null) {
final String query = insert + str.substring(0, str.length() - 1) + "IF NOT EXISTS ;";
session.executeAsync(query);
}
}
session.close();
cluster.close();
}
here are the exceptions that I had when I execute the Code:
Error querying /x.x.x.1:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [/x.x.x.1] Pool is busy (no available connection and the queue has reached its max size 256)
Error querying /x.x.x.2:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [/x.x.x.2] Pool is busy (no available connection and the queue has reached its max size 256)
Error querying /x.x.x.3:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [/x.x.x.3] Pool is busy (no available connection and the queue has reached its max size 256)
Error querying /x.x.x.4:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [/x.x.x.4] Pool is busy (no available connection and the queue has reached its max size 256)
Error querying /x.x.x.5:9042 : com.datastax.driver.core.exceptions.BusyPoolException: [/x.x.x.5] Pool is busy (no available connection and the queue has reached its max size 256)
Busy exception occurs when you put too many request on one connection. You need to control how many requests are sent. Simplest way will be to use semaphore or something like. I have a class that wraps the Session and allows to control the number of inflight requests, so it behaves like async until you reach the limit, and will block until the number of in-flight requests will go under the limit. You can use my code, or implement something similar.
Update: You're using the light-weight transactions (LWT) (the IF NOT EXISTS clause), and this is heavily affect performance of your cluster because every insert need to be coordinated with other nodes...

Amazon AWS Client Timeout with lots of requests

We have a Spring Boot application that stores multimedia files (up to 100 MB in size) in a S3 compatible cloud storage. The application receives these files via REST call or an AMQP message broker (RabbitMQ).
Usually the load on the system is moderate so that there is no problem at all. However we encounter problems with accessing the S3 when there is heavy load on the system. Currently we are working around this issue with using a pool of 10 AmazonS3Clients that are assigned randomly to the calling process. This actually improves the issue but does not fix the problem. When the load is too high (meaning plenty of write and read operations) we encounter an exception of this sort:
com.amazonaws.AmazonClientException: Unable to execute HTTP request: connect timed out
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:299)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:170)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2648)
at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1049)
at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:924)
We're using the 1.3.8 version of the aws-java-sdk and cannot easily update to a newer version due to the region settings in the newer versions. The signing algorithm prevents us from accessing our buckets properly in the newest version.
The implementation looks as follows:
Initialization (at constructor level):
ClientConfiguration clientConfiguration = new ClientConfiguration();
clientConfiguration.setConnectionTimeout(AWS_CONNECTION_TIMEOUT);
clientConfiguration.setMaxConnections(AWS_MAX_CONNECTIONS);
AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
for (int i = 0; i < AWS_MAX_CLIENTS; i++) {
s3[i] = new AmazonS3Client(credentials, clientConfiguration);
s3[i].setEndpoint(endpoint);
}
Put:
int i = getRandomClient();
s3[i].putObject(bucketName, key, file);
Get:
ReadableByteChannel channel;
try {
int i = getRandomClient();
S3Object object = s3[i].getObject(bucketName, addPrefix(fileId, prefix));
S3ObjectInputStream stream = object.getObjectContent();
channel = Channels.newChannel(stream);
File file = File.createTempFile(fileId, "");
try (WritableByteChannel outChannel = Channels.newChannel(new FileOutputStream(file))) {
ByteBuffer buffer = ByteBuffer.allocate(8192);
int read;
while ((read = channel.read(buffer)) > 0) {
buffer.rewind();
buffer.limit(read);
while (read > 0) {
read -= outChannel.write(buffer);
}
buffer.clear();
}
IOUtils.closeQuietly(stream);
return file;
}
}
catch (AmazonClientException e) {
if (!isMissingKey(e)) {
throw new IOException(e);
}
}
finally {
if (channel != null) {
channel.close();
}
}
It is pretty clear that the limited number of connections and clients is the bottleneck. There are plenty of ways how we could tweak the implementation to work properly. We could of course limit the number of consumers listening to the message broker. We could also increase the timeouts, number and connections of aws clients or limit the throughput in the service layer. However we're looking for a more sophisticated approach to handle things here.
Is there any way to tell whether or not a designated client can currently be used or has too many open connections? Is there any way one could let the client wait for the next free connection?
Increasing the number of clients is no different than increasing the connection pool size of a single client, except now you have to worry about pseudo-"load balancing" your array of clients with getRandomClient(). Additionally, there is significant overhead to creating multiple clients and maintaining an unnecessary number of connection pools. You are trying to reinvent the wheel.
One thing you can do is catch the Exception thrown during timeouts like so:
try {
... do s3 read/write ...
} catch (AmazonClientException ace) {
if (ace.getCause() instanceof org.apache.http.conn.ConnectionPoolTimeoutException) {
log.error("S3 connection pool timeout!");
}
}
Use this to help tune your connection pool size. Basically just keep making it bigger until this is no longer your bottleneck.

Java NIO TCP timeout issue

I am using one SocketChannel in 2 threads, one thread for sending the data and another for receiving the data.
SocketChannel socketChannel = SocketChannel.open(new InetSocketAddress(ip,port));
socketChannel.configureBlocking(false);
Thread 1: uses the above socketchannel to write the data
Thread 2: uses the same socketchannel to read the data
I am not using any selectors with the socketchannel as I need the write and read to be asynchronous (using 2 different threads)
PROBLEM: When the connection is lost, the socketchannel.write() and socketchannel.read() operation does not throw any error. It just blocks the operation.
I need to detect the connection loss.
I tried using the heartbeat method in Thread 2 but because the read operation just blocks, this method did not work. Is there any other way to detect the connection loss without using the heartbeat in a new Thread?
Is it possible to throw error while writing/reading if there is connection loss?
Thanks in advance.
EDIT:
Thread 1:
public void run() {
socketChannel = SendAndReceivePacketUtil.createConnection(ip, port);
socketChannel.configureBlocking(false);
RecTask task = new RecTask(socketChannel);
Thread recThread = new Thread(task);
recThread.start();
while(true)
{
byte[] data= getDataFromQueue(ip);
if(data!= null) {
//print(new String(data));
sendPacket(data, socketChannel);
}
}
}
Thread 2: (RecTask)
public void run() {
while(true) {
byte[] data = receivePacket(socketChannel);
//print(new String(data));
}
}
Both Thread 1 & 2 have try-catch-finally blocks. finally closes the socketchannel.
sendPacket:
int dataSent = 0;
while (dataSent < data.length) {
long n = socketChannel.write(buf);
if (n < 0) {
throw new Exception();
}
dataSent += (int) n;
}
receivePacket:
int dataRec = 0;
byte[] data = new byte[length];
ByteBuffer buffer = ByteBuffer.wrap(data);
while (dataRec < length) {
long n = socketChannel.read(buffer);
if (n < 0) {
throw new Exception();
}
dataRec += (int) n;
}
return data;
I send and receive data continuously. But as soon as the connection is lost, nothing prints and the code just gets stuck. Its an android wifi direct application. For connection loss scenario I just switch off the wifi module.
I am not using any selectors with the socketchannel as I need the write and read to be asynchronous (using 2 different threads)
That's not a reason to avoid a Selector. In fact it's rather difficult to write correct non-blocking NIO code without a Selector.
PROBLEM: When the connection is lost, the socketchannel.write() and socketchannel.read() operation does not throw any error. It just blocks the operation.
No it doesn't. You're in non-blocking mode. It either returns a postive integer, or zero, or throws an exception. Which is it?
I tried using the heartbeat method in Thread 2 but because the read operation just blocks, this method did not work.
The read operation does not block in non-blocking mode.
Is there any other way to detect the connection loss without using the heartbeat in a new Thread?
The only reliable way to detect connection loss in TCP is to write to the connection. Eventually this will throw IOException: connection reset. But it won't happen the first time after the connection loss, due to buffering, retries, etc.
Is it possible to throw error while writing/reading if there is connection loss?
That's what happens.
There is something seriously wrong with this question. Either the code you posted isn't the real code or it isn't behaving as you described. You need to post more of it, e.g. your read and write code.
You can look for enabling TCP-KEEP alive option on the socket. On idle connection keep-alive messages are sent and ACK is expected for those at the TCP layer.
If TCP-KEEP alive fails, your next read/write operation will result in error (ECONNRESET) which can be used as a sign of connection loss.

Weird JNI crash when using FileChannel.transferTo

I am using FileChannel.transferTo method to trasnfer bytes from my local file to a network socket. When the other end of the socket closes the connection, either using .close on the channel, or due to some error, I get a very long JNI error stack-Strace in my android application.
The parts that I could make sense of are :
The crash happens at line :
fileChannel.transferTo
(offset, bytesSize, channel);
The error :
JNI DETECTED ERROR IN APPLICATION: JNI SetLongField called with
pending exception 'android.system.ErrnoException' thrown in long
libcore.io.Posix.sendfile(java.io.FileDescriptor,
java.io.FileDescriptor, android.util.MutableLong, long):-2
This is the Java code that gets executed after my transferTo call :
at libcore.io.Posix.sendfile(Native method) at
libcore.io.BlockGuardOs.sendfile(BlockGuardOs.java:265) at
java.nio.FileChannelImpl.transferTo(FileChannelImpl.java:431)
From the java source code this is the line 431 in FileChannelImpl
try {
MutableLong offset = new MutableLong(position);
long rc = Libcore.os.sendfile(outFd, fd, offset, count); //line 431
completed = true;
return rc;
} catch (ErrnoException errnoException) {
// If the OS doesn't support what we asked for, we want to fall through and
// try a different approach. If it does support it, but it failed, we're done.
if (errnoException.errno != ENOSYS && errnoException.errno != EINVAL) {
throw errnoException.rethrowAsIOException();
}
}
From what I understand, an IO exception should have been thrown, because the ErrnoException get "rethrown" as an IOException. Apparently the Mutable-Long is being set while there is a pending exception. Although I doubt this, but is there somehow a way to prevent app crash after this "bug" ?

native errors on DatagramChannel send

Basic
I have an app that is sending packets using DatagramChannel.send in multiple threads each to its own IP address/port and each of them keeping constant bit-rate/bandwidth. Every now and then I get this error:
java.net.SocketException: Invalid argument: no further information
at sun.nio.ch.DatagramChannelImpl.send0(Native Method)
at sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(Unknown Source)
at sun.nio.ch.DatagramChannelImpl.send(Unknown Source)
at sun.nio.ch.DatagramChannelImpl.send(Unknown Source)
...
It happens on random - sometimes 5 minutes after start sometimes after a day - so I really have problems reproducing it for testing. And on my home machine I can't reproduce it at all.
Environments
Windows 7, 8 and Server 2012 (all 64bit)
64bit Java 7 update 45
More information
The app is sending SI/EIT data to DVB-C network. I'm creating a list of 188-byte arrays for each of 80-120 threads and giving it to use. The thread takes the list and is looping over the list until new list is provided.
The error usually happens on multiple channels at once. But it can happen on just one also.
The error never happened until we had 40+ threads.
The error happens while looping over the list, not when I'm binding new list to thread.
The app it not running out of memory. Its usually running up to 70% of memory given to JVM.
Strange part: If I run multiple instance of app each handling ~10 threads problems are the same.
Simplified code sample
for(int i = 0; i < 100; ++i) {
final int id = i;
new Thread(new Runnable() {
#Override
public void run() {
final Random r = new Random();
final List<byte[]> buffer = Lists.newArrayList();
for(int i = 0; i < 200; ++i) {
final byte[] temp = new byte[188];
r.nextBytes(temp);
buffer.add(temp);
}
final SocketAddress target = new InetSocketAddress("230.0.0.18", 1000 + id);
try (final DatagramChannel channel = DatagramChannel.open(StandardProtocolFamily.INET)) {
channel.configureBlocking(false);
channel.setOption(StandardSocketOptions.IP_MULTICAST_IF, NetworkInterface.getByName("eth0"));
channel.setOption(StandardSocketOptions.IP_MULTICAST_TTL, 8);
channel.setOption(StandardSocketOptions.SO_REUSEADDR, true);
channel.setOption(StandardSocketOptions.SO_SNDBUF, 1024 * 64);
int counter = 0;
int index = 0;
while(true) {
final byte[] item = buffer.get(index);
channel.send(ByteBuffer.wrap(item), target);
index = (index + 1) % buffer.size();
counter++;
Thread.sleep(1);
}
}
catch(Exception e) {
LOG.error("Fail at " + id, e);
}
}
}).start();
}
Edits:
1) #EJP: I'm setting setting multicast properties as the actual app that I use was doing joins (and reading some data). But the problems persisted even after I removed them.
2) Should I be using some other API if I just need to send UDP packets? All the samples I could find use DatagramChannel (or its older alternative).
3) I'm still stuck with this. If anyone has an idea what can I even try, please let me know.
I had exactly the same problem, and it was caused by a zero port in the target InetSocketAddress, when calling the send method.
In your code, the target port is defined as 1000 + i, so it doesn't seem to be the problem. Anyway, I'd log the target parameters that are used when the exception is thrown, just in case.

Resources