Does Spring Kafka Retry pick messages amongst different partitions? - spring-kafka

I would like to know how Spring Kafka handles retries given multiple partitions assigned to an instance. Does Spring Kafka keep retrying the same message according to the retry policy and backoff policy or does it retry, and in-between retrying, does it send messages from other partitions?
Is the behavior:
A) retry message -> retry message -> retry message
B) retry message -> other message -> retry message -> retry message
I've looked at other stackoverflow questions that seem to confirm that given a single partition Spring Kafka will not move to another offset, but there was no info on what is behavior if there were multiple partitions assigned to the instance. I've implemented a factory that has a retry template and stateful retry.
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<Integer, String>>
kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
ListenerExceptions listenerExceptions = new ListenerExceptions();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(KafkaProperties.CONCURRENCY);
factory.getContainerProperties().setPollTimeout(KafkaProperties.POLL_TIMEOUT_VLAUE);
factory.setRetryTemplate(retryTemplate());
factory.setErrorHandler(new SeekToCurrentErrorHandler());
factory.setStatefulRetry(true);
factory.setRecoveryCallback((RetryContext context) -> listenerExceptions.recover(context));
return factory;
}

The retry configuration from the mentioned factory is delegated into the RetryingMessageListenerAdapter, which logic is like this:
public void onMessage(final ConsumerRecord<K, V> record, final Acknowledgment acknowledgment,
final Consumer<?, ?> consumer) {
RetryState retryState = null;
if (this.stateful) {
retryState = new DefaultRetryState(record.topic() + "-" + record.partition() + "-" + record.offset());
}
getRetryTemplate().execute(context -> {
context.setAttribute(CONTEXT_RECORD, record);
switch (RetryingMessageListenerAdapter.this.delegateType) {
case ACKNOWLEDGING_CONSUMER_AWARE:
context.setAttribute(CONTEXT_ACKNOWLEDGMENT, acknowledgment);
context.setAttribute(CONTEXT_CONSUMER, consumer);
RetryingMessageListenerAdapter.this.delegate.onMessage(record, acknowledgment, consumer);
break;
case ACKNOWLEDGING:
context.setAttribute(CONTEXT_ACKNOWLEDGMENT, acknowledgment);
RetryingMessageListenerAdapter.this.delegate.onMessage(record, acknowledgment);
break;
case CONSUMER_AWARE:
context.setAttribute(CONTEXT_CONSUMER, consumer);
RetryingMessageListenerAdapter.this.delegate.onMessage(record, consumer);
break;
case SIMPLE:
RetryingMessageListenerAdapter.this.delegate.onMessage(record);
}
return null;
},
getRecoveryCallback(), retryState);
}
So, we do retry per message. According Apache Kafka recommendations we process one partition in one thread, so every next record in that partition won't be handled until retry is exhausted or call has been successful.
According to your multiple partitions condition and factory.setConcurrency(KafkaProperties.CONCURRENCY); configuration, it might be the fact that different partitions are processed in different threads. Therefore it might be the case that different records from different partitions are retried at the same time. Just because a retry is tied to the thread and call stack.

Related

Using a timing based PollingConsumer to a direct endpoint

Functionally I wish to check a URL is active before I consume from a JMS (WMQ) endpoint.
If the URL cannot be reached or a server error, then I do not want to pick up from the queue. So I want to keep trying (with unlimited retries) the URL via a polling consumer. So as soon as it is available I can pick up from JMS.
I have a RouteBuilder that is set up with a direct endpoint, that is configured to run a Processor that will ping a service.
So:
public class PingRoute extends RouteBuilder {
#Override
public void configureCamel() {
from("direct:pingRoute").routeId(PingRoute.class.getSimpleName())
.process(new PingProcessor(url))
.to("log://PingRoute?showAll=true");
}
}
In another route I am setting up my timer:
#Override
public void configureCamel() {
from(timerEndpoint).beanRef(PollingConsumerBean.class.getSimpleName(), "checkPingRoute");
...
}
And with the PollingConsumerBean I am attempting to receive the body via a consumer:
public void checkPingRoute(){
// loop to check the consumer. Check we can carry on with the pick up from the JMS queue.
while(true){
Boolean pingAvailable = consumer.receiveBody("direct:pingRoute", Boolean.class);
...
}
I add the route to the context and use a producer to send:
context.addRoutes(new PingRoute());
context.start();
producer.sendBody(TimerPollingRoute.TIMER_POLLING_ROUTE_ENDPOINT, "a body");
And I get the following IllegalArgumentException:
Cannot add a 2nd consumer to the same endpoint. Endpoint Endpoint[direct://pingRoute] only allows one consumer.
Is there a way to setup the direct route as a polling consumer?
Business logic is not quite clear, unfortunately. As I understand it - you need to wait for a response from the service. IMHO you have to use Content Enricher EIP http://camel.apache.org/content-enricher.html . pollEnrich is what you need at timer route.
.pollEnrich("direct:waitForResponce", -1) or
.pollEnrich("seda:waitForResponce", -1)
public class PingRoute extends RouteBuilder {
#Override
public void configureCamel() {
from("direct:pingRoute").routeId(PingRoute.class.getSimpleName())
.process(new PingProcessor(url))
.choice().when(body())
.to("log://PingRoute?showAll=true")
.to("direct:waitForResponce")
.otherwise()
.to("direct:pingRoute")
.end();
}
};
timer:
#Override
public void configureCamel() {
from(timerEndpoint)
.inOnly("direct:pingRoute")
.pollEnrich("direct:waitForResponce", -1)
...
}
Based on the OP's clarification of their use case, they have several problems to solve:
Consume the message from the JMS queue if, and only if, the ping to the URL is positive.
If the URL is unresponsive, the JMS message should not disappear from the queue and a retry must take place until the URL becomes responsive again, in which case the message will be ultimately consumed.
The OP has not specified if the amount of retries is limited or unlimited.
Based on this problem scenario, I suggest a redesign of their solution that leverages ActiveMQ retries, broker-side redelivery and JMS transactions in Camel to:
Return the message to the queue if the URL ping failed (via a transaction rollback).
Ensure that the message is not lost (by using JMS persistence and broker-side redeliveries, AMQ will durably schedule the retry cycle).
Be able to specify a sophisticated retry cycle per message, e.g. with exponential backoffs, maximum retries, etc.
Optionally sending the message to a Dead Letter Queue if the retry cycle was exhausted without a positive result, so that some other (possibly manual) action can be planned.
Now, implementation-wise:
from("activemq:queue:abc?transacted=true") // (1)
.to("http4://host.endpoint.com/foo?method=GET") // (2) (3)
.process(new HandleSuccess()); // (4)
Comments:
Note the transacted flag.
If the HTTP invocation fails, the HTTP4 endpoint will raise an Exception.
Since there are no configured exception handlers, Camel will propagate the exception to the consumer endpoint (activemq) which will rollback the transaction.
If the invocation succeeded, the flow will continue and the exchange body will now contain the payload returned by the HTTP server and you can handle it in whichever way you wish. Here I'm using a processor.
Next, what's important is that you configure the redelivery policy in ActiveMQ, as well as enable broker-side redeliveries. You do that in your activemq.xml configuration file:
<plugins>
<redeliveryPlugin fallbackToDeadLetter="true" sendToDlqIfMaxRetriesExceeded="true">
<redeliveryPolicyMap>
<redeliveryPolicyMap>
<redeliveryPolicyEntries>
<redeliveryPolicy queue="my.queue"
initialRedeliveryDelay="30000"
maximumRedeliveries="17"
maximumRedeliveryDelay="259200000"
redeliveryDelay="30000"
useExponentialBackOff="true"
backOffMultiplier="2" />
</redeliveryPolicyEntries>
</redeliveryPolicyMap>
</redeliveryPolicyMap>
</redeliveryPlugin>
</plugins>
And make sure that the scheduler support is enabled in the top-level <broker /> element:
<broker xmlns="http://activemq.apache.org/schema/core"
brokerName="mybroker"
schedulerSupport="true">
...
</broker>
I hope that helps.
EDIT 1: OP is using IBM WebSphere MQ as a broker, I missed that. You could use a JMS QueueBrowser to peek at messages and try their corresponding URLs before actually consuming a message, but it is not possible to selectively consume an individual message – that's not what MOM (messaging-oriented middleware) is about.
So I insist that you should explore JMS transactions, but rather than leaving it up to the broker to redeliver the message, you can start the pinging cycle to the URL within the TX body itself. With regards to Camel, you could implement it as follows:
from("jms:queue:myqueue?transacted=true")
.bean(new UrlPinger());
UrlPinger.java:
public class UrlPinger {
#EndpointInject
private ProducerTemplate template;
private Pattern pattern = Pattern.compile("^(http(?:s)?)\\:");
#Handler
public void pingUrl(#Body String url, CamelContext context) throws InterruptedException {
// Replace http(s): with http(s)4: to use the Camel HTTP4 endpoint.
Matcher m = pattern.matcher(url);
if (m.matches()) {
url = m.replaceFirst(m.group(1) + "4:");
}
// Try forever until the status code is 200.
while (getStatusCode(url, context) != 200) {
Thread.sleep(5000);
}
}
private int getStatusCode(String url, CamelContext context) {
Exchange response = template.request(url + "?method=GET&throwExceptionOnFailure=false", new Processor() {
#Override public void process(Exchange exchange) throws Exception {
// No body since this is a GET request.
exchange.getIn().getBody(null);
}
});
return response.getIn().getHeader(Exchange.HTTP_RESPONSE_CODE, Integer.class);
}
}
Notes:
Note the throwExceptionOnFailure=false option. An Exception will not be raised, therefore the loop will execute until the condition is true.
Inside the bean, I'm looping forever until the HTTP status is 200. Of course, your logic will be different.
Between attempt and attempt, I'm sleeping 5000ms.
I'm assuming the URL to ping is in the body of the incoming JMS message. I'm replacing the leading http(s): with http(s)4: in order to use the Camel HTTP4 endpoint.
Performing the pinging inside the TX guarantees that the message will only be consumed once the ping condition is true (in this case HTTP status == 200).
You might want to introduce a desist condition (you don't want to keep trying forever). Maybe introduce some backoff to not overwhelm the other party.
If either Camel or the broker goes down within a retry cycle, the message will be automatically rolled back.
Take into account that JMS transactions are Session-bound, so if you want to start many concurrent consumers (concurrentConsumers JMS endpoint option), you'll need to set cacheLevelName=CACHE_NONE for each thread to use a different JMS Session.
I am having a bit of difficulty figuring out exactly what you want to do, but it appears to me that you want to consume data from an endpoint on an interval. For this the best pattern is a polling consumer: http://camel.apache.org/polling-consumer.html
The error you are currently receiving is because you have two consumers both trying to read from the "direct://pingRoute" If this was intended you could change the direct to a seda://pingRoute so its an in memory queue your data will be in.
All the answers here pointed me on the right direction but I finally came up with a solution that managed to fit our code base and framework.
Firstly, I discovered there isn't a need to have bean to act as a polling consumer but a processor could be used instead.
#Override
public void configureCamel() {
from("timer://fnzPoller?period=2000&delay=2000").processRef(UrlPingProcessor.class.getSimpleName())
.processRef(StopStartProcessor.class.getSimpleName()).to("log://TimerPollingRoute?showAll=true");
}
Then in the UrlPingProcessor there is CXF service to ping the url and can check the response :
#Override
public void process(Exchange exchange) {
try {
// CXF service
FnzPingServiceImpl fnzPingService = new FnzPingServiceImpl(url);
fnzPingService.getPing();
} catch (WebApplicationException e) {
int responseCode = e.getResponse().getStatus();
boolean isValidResponseCode = ResponseCodeUtil.isResponseCodeValid(responseCode);
if (!isValidResponseCode) {
// Sets a flag to stop for the StopStartProcessor
stopRoute(exchange);
}
}
}
Then in the StopStartProcessor it is using a ExecutorService to stop or start a route via new thread.:
#Override
public void process(final Exchange exchange) {
// routeBuilder is set on the constructor.
final String routeId = routeBuilder.getClass().getSimpleName();
Boolean stopRoute = ExchangeHeaderUtil.getHeader(exchange, Exchange.ROUTE_STOP, Boolean.class);
boolean stopRoutePrim = BooleanUtils.isTrue(stopRoute);
if (stopRoutePrim) {
StopRouteThread stopRouteThread = new StopRouteThread(exchange, routeId);
executorService.execute(stopRouteThread);
} else {
CamelContext context = exchange.getContext();
Route route = context.getRoute(routeId);
if (route == null) {
try {
context.addRoutes(routeBuilder);
} catch (Exception e) {
String msg = "Unable to add a route: " + routeBuilder;
LOGGER.warn(msg, e);
}
}
}
}

Spring Kafka Retry Logging

I have a requirement to consume from a kafka topic,do some work on records and produce to another topic with spring-kafka 2.1.7.Other requiremenrs are transactional for once only semantics,retry and error handling.On failure to commit a record I should do 3 retries ,log each of the retry messaage to retry topic anf on failure of all retries send the record to a dead letter topic . I looked at https://github.com/spring-projects/spring-kafka/issues/575 and it has excellent details on solving the problem. The thing that I am struggling with is how to log each of the retry message with details like consumer offset ,topic it was trying to commit,etc .Is there a way to get these from retry call back ?. The retrylistener snippet below is registered with a org.springframework.kafka.listener.LoggingErrorHandler that is set as container property to ConcurrentKafkaListenerContainerFactory ?
#Bean
public RetryListener retryListener(KafkaTemplate<String,SpecificRecord> kafkaTemplate) {
return new RetryListenerSupport() {
public void onError(RetryContext context, RetryCallback callback, Throwable throwable) {
int retryCount =context.getRetryCount();
kafkaTemplate) .send(new ProducerRecord<String,SpecificRecord>("topic_name",record));
}
};
}
The RetryContext is populated with some useful info in the RetryingMessageListenerAdapter:
context.setAttribute(CONTEXT_RECORD, record);
switch (RetryingMessageListenerAdapter.this.delegateType) {
case ACKNOWLEDGING_CONSUMER_AWARE:
context.setAttribute(CONTEXT_ACKNOWLEDGMENT, acknowledgment);
context.setAttribute(CONTEXT_CONSUMER, consumer);
RetryingMessageListenerAdapter.this.delegate.onMessage(record, acknowledgment, consumer);
break;
case ACKNOWLEDGING:
context.setAttribute(CONTEXT_ACKNOWLEDGMENT, acknowledgment);
RetryingMessageListenerAdapter.this.delegate.onMessage(record, acknowledgment);
break;
case CONSUMER_AWARE:
context.setAttribute(CONTEXT_CONSUMER, consumer);
RetryingMessageListenerAdapter.this.delegate.onMessage(record, consumer);
break;
case SIMPLE:
RetryingMessageListenerAdapter.this.delegate.onMessage(record);
}

Alert if Kafka nodes are down

I have been exposed a Kafka nodes and a topic-name. My web server receives a lot of http request data which I need to process them and then push them to kafka. Sometimes, if the kafka nodes are down, then my server still keeps pumping the data, which results in blowing up my memory and my server gets down.
I want is to stop publishing the data if the Kafka is down. My Java sample code is as follows:
static Producer producer;
Produce() {
Properties properties = new Properties();
properties.put("request.required.acks","1");
properties.put("bootstrap.servers","localhost:9092,localhost:9093,localhost:9094");
properties.put("enabled","true");
properties.put("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
properties.put("kafka-topic","pixel-server");
properties.put("batch.size","1000");
properties.put("producer.type","async");
properties.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
producer = new KafkaProducer<String, String>(properties);
}
public static void main(String[] args) {
Produce produce = new Produce();
produce.send(producer, "pixel-server", "Some time");
}
//This method is called lot of times
public void send(Producer<String, String> producer, String topic, String data) {
ProducerRecord<String, String> producerRecord = new ProducerRecord<>(topic, data);
Future<RecordMetadata> response = producer.send(producerRecord, (metadata, exception) -> {
if (null != exception) {
exception.printStackTrace();
} else {
System.out.println("Done");
}
});
I have just abstracted out some sample code. The send method is called numerous times. I just want to prevent send any message if the kafka is down. What is the efficient way to tackle this situation.
If I were you I'll try to implement a circuit breaker. When you hit a reasonable amount of failures while sending your records, circuit breaks and provides some fallback behavior. Once some condition is met (e.g.: some time passed) the circuit close and you'll send records again. Also vertx.io comes with it's own solution.

Race-condition between #SubscribeMapping and ChannelInterceptorAdapter.preSend

In my Spring Boot 1.5.9 application with Spring Websockets, web-socket subscriptions are intercepted with an implementation of ChannelInterceptorAdapter:
#Override
public Message<?> preSend(Message<?> message, MessageChannel channel) {
StompHeaderAccessor accessor = StompHelper.accessor(message);
StompCommand command = accessor.getCommand();
if (null != command) {
StompHelper.authentication(message, tokenHandler);
switch (command) {
case SUBSCRIBE:
this.stompHandler.subscribe(message);
break;
case UNSUBSCRIBE:
this.stompHandler.unsubscribe(message);
break;
// ...
}
}
return super.preSend(message, channel);
}
which captures the subscriber and destination so subscriber-specific messages can be sent. In the web-socket controller, the #SubscribeMapping implementation publishes initialization data to the subscriber. But the #SubscribeMapping and ChannelInterceptorAdapter.preSend are invoked in different threads, so occasionally the #SubscribeMapping executes before the completion of ChannelInterceptorAdapter.preSend which results in the subscriber not getting the initialization data.
This seems like a Spring Websocket design problem, but is there an (elegant-ish) work-around?
Update: submitted a bug: SPR-16323

Apache Kafka System Error Handling

We are trying to implement Kafka as our message broker solution. We are deploying our Spring Boot microservices in IBM BLuemix, whose internal message broker implementation is Kafka version 0.10. Since my experience is more on the JMS, ActiveMQ end, I was wondering what should be the ideal way to handle system level errors in the java consumers?
Here is how we have implemented it currently
Consumer properties
enable.auto.commit=false
auto.offset.reset=latest
We are using the default properties for
max.partition.fetch.bytes
session.timeout.ms
Kafka Consumer
We are spinning up 3 threads per topic all having the same groupId, i.e one KafkaConsumer instance per thread. We have only one partition as of now. The consumer code looks like this in the constructor of the thread class
kafkaConsumer = new KafkaConsumer<String, String>(properties);
final List<String> topicList = new ArrayList<String>();
topicList.add(properties.getTopic());
kafkaConsumer.subscribe(topicList, new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(final Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(final Collection<TopicPartition> partitions) {
try {
logger.info("Partitions assigned, consumer seeking to end.");
for (final TopicPartition partition : partitions) {
final long position = kafkaConsumer.position(partition);
logger.info("current Position: " + position);
logger.info("Seeking to end...");
kafkaConsumer.seekToEnd(Arrays.asList(partition));
logger.info("Seek from the current position: " + kafkaConsumer.position(partition));
kafkaConsumer.seek(partition, position);
}
logger.info("Consumer can now begin consuming messages.");
} catch (final Exception e) {
logger.error("Consumer can now begin consuming messages.");
}
}
});
The actual reading happens in the run method of the thread
try {
// Poll on the Kafka consumer every second.
final ConsumerRecords<String, String> records = kafkaConsumer.poll(1000);
// Iterate through all the messages received and print their
// content.
for (final TopicPartition partition : records.partitions()) {
final List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
logger.info("consumer is alive and is processing "+ partitionRecords.size() +" records");
for (final ConsumerRecord<String, String> record : partitionRecords) {
logger.info("processing topic "+ record.topic()+" for key "+record.key()+" on offset "+ record.offset());
final Class<? extends Event> resourceClass = eventProcessors.getResourceClass();
final Object obj = converter.convertToObject(record.value(), resourceClass);
if (obj != null) {
logger.info("Event: " + obj + " acquired by " + Thread.currentThread().getName());
final CommsEvent event = resourceClass.cast(converter.convertToObject(record.value(), resourceClass));
final MessageResults results = eventProcessors.processEvent(event
);
if ("Success".equals(results.getStatus())) {
// commit the processed message which changes
// the offset
kafkaConsumer.commitSync();
logger.info("Message processed sucessfully");
} else {
kafkaConsumer.seek(new TopicPartition(record.topic(), record.partition()), record.offset());
logger.error("Error processing message : {} with error : {},resetting offset to {} ", obj,results.getError().getMessage(),record.offset());
break;
}
}
}
}
// TODO add return
} catch (final Exception e) {
logger.error("Consumer has failed with exception: " + e, e);
shutdown();
}
You will notice the EventProcessor which is a service class which processes each record, in most cases commits the record in database. If the processor throws an error (System Exception or ValidationException) we do not commit but programatically set the seek to that offset, so that subsequent poll will return from that offset for that group id.
The doubt now is that, is this the right approach? If we get an error and we set the offset then until that is fixed no other message is processed. This might work for system errors like not able to connect to DB, but if the problem is only with that event and not others to process this one record we wont be able to process any other record. We thought of the concept of ErrorTopic where when we get an error the consumer will publish that event to the ErrorTopic and in the meantime it will keep on processing other subsequent events. But it looks like we are trying to bring in the design concepts of JMS (due to my previous experience) into kafka and there may be better way to solve error handling in kafka. Also reprocessing it from error topic may change the sequence of messages which we don't want for some scenarios
Please let me know how anyone has handled this scenario in their projects following the Kafka standards.
-Tatha
if the problem is only with that event and not others to process this one record we wont be able to process any other record
that's correct and your suggestion to use an error topic seems a possible one.
I also noticed that with your handling of onPartitionsAssigned you essentially do not use the consumer committed offset, as you seem you'll always seek to the end.
If you want to restart from the last succesfully committed offset, you should not perform a seek
Finally, I'd like to point out, though it looks like you know that, having 3 consumers in the same group subscribed to a single partition - means that 2 out of 3 will be idle.
HTH
Edo

Resources