close

Soby Chacko

Soby Chacko

Spring Cloud Stream/Data Flow

Philadelphia, PA

Blog Posts by Soby Chacko

Introducing Experimental Spring Support for Apache Pulsar

We are happy to announce that we are incubating a new experimental Spring project for Apache Pulsar. This project aims to provide Spring-friendly APIs, building blocks, and programming models for writing Java applications that interact with Apache Pulsar.

Apache Pulsar is a popular messaging system with a growing ecosystem of developers in the enterprise messaging and streaming space. Here are some main features and advantages of using Apache Pulsar for messaging-based software applications:

  • Apache Pulsar provides both the traditional queuing semantics of RabbitMQ, ActiveMQ, and others and the log-based structure of Apache Kafka through various subscription models.
  • The broker in Apache Pulsar is stateless, and the storage is not part of the broker. Instead, it uses another Apache project called Bookkeeper to separate the storage layer from the broker. Because of this fundamental design, scaling up Apache Pulsar brokers is easy.
  • Apache Pulsar uses distributed logs, called ledgers, leveraged through Bookkeeper. These ledgers distribute across multiple nodes of Bookkeeper.
  • Another advantage of separating the storage layer (ledgers) from the serving layer (brokers) is that they can scale separately.
  • Topic partitioning is a feature, not a requirement. You only need to use partitioning if the use case demands it.
  • Built-in geo-replication mechanisms.
  • Built-in multi-tenancy capabilities.
Read more...

Demystifying Spring Cloud Stream producers with Apache Kafka partitions

In this blog, we are taking a deeper look at writing a Spring Cloud Stream producer with Apache Kafka and how it handles native partitions in Kafka.

Spring Cloud Stream has a middleware agnostic concept of partitions. Whenever possible, Spring Cloud Stream leverages the native partitioning capabilities of the middleware if it has such capabilities as in the case of Apache Kafka. This blog looks at how a Spring Cloud Stream developer handles partitions when writing a producer application that publishes data to Kafka. In a subsequent article, we will look at how consumers handle partitions in a Kafka based Spring Cloud Stream application.

Read more...

Announcing Spring Cloud Stream Applications 2020.0.0 GA Release

We are glad to announce the GA release of the newly redesigned Spring Cloud Stream applications - 2020.0.0.

We would like to use this release announcement as an opportunity to wrap up the blog series that we started in the summer. Therefore, consider this as part 15 of the blog series. In this blog, we are going to give a rundown of all the previous episodes in the series, but first, let us go through some release details.

Release Overview

2020.0.0 GA release contains the completely revamped functional foundation for the event-streaming applications. The old structure was based on an app starter model in which the critical logic for the applications is provided as part of a starter module. The starters then form the foundation for the applications. While it worked for the previous four generations of these app starters (Avogadro, Bacon, Celsius, Darwin, and Einstein), it deemed necessary to rewrite these starters as reusable functions so that they can be used for a wide array of use cases beyond what is required in the out of the box applications. Therefore, many of the old app starters were refactored and redesigned as functions, suppliers, and consumers. For the out of the box Spring Cloud Stream binder based applications, we take these functional components and use them as the base to build them. Other custom applications, even non-streaming use cases, can be designed using these functional components as a foundation. The functions can be composed together to implement many other data integration use cases.

There is a new single mono repository that hosts all the stream application components. The source for all the currently available functions and applications can be found there. These collections comprise components that satisfy a wide spectrum of use cases such as data ingestion, ETL, machine learning, analytics, file processing, etc. among many others. Take a look at the README to get more information on what is available.

Read more...

Case Study: Elasticsearch sink

This article is part of a blog series that explores the newly redesigned Spring Cloud Stream applications based on Java Functions. In this post, we will look at the Elasticsearch sink that allows us to index records in Elasticsearch, and its corresponding Consumer function.

Here are all the previous parts of this blog series.

Read more...

Case Study: Relational Database Source and File Sink

This article is part of a blog series that explores the newly redesigned Spring Cloud Stream applications based on Java Functions. In this episode, we are exploring the JDBC supplier and the source based on Spring Cloud Stream. We will see how we can export data from a relational database and dump it into a file using a File Consumer and the corresponding Spring Cloud Stream File sink. We will look at a few different ways in which we can run JDBC Source and send the data to a file.

Here are all the previous parts of this blog series.

Read more...

Case Study: Reading from a file and writing to MongoDB

This article is part of a blog series that explores the newly redesigned Spring Cloud Stream applications based on Java Functions. In this episode, we are taking a deeper look into the file supplier and its Spring Cloud Stream file source counterpart. We will also see a MongoDB consumer and its corresponding Spring Cloud Stream sink. Finally, we will demonstrate how the File source and the MongoDB sink can be orchestrated together on Spring Cloud Data Flow as a pipeline.

Here are all the previous parts of this blog series.

Read more...

Creating a function for consuming data and generating Spring Cloud Stream Sink applications

This is part 4 of the blog series in which we are introducing java functions for Spring Cloud Stream applications.

Other parts in the series.

In the last blog in this series, we saw how we can use a java.util.function.Supplier to generate a Spring Cloud Stream source. In this new edition, we will see how a consuming function can be developed and tested using java.util.function.Consumer and java.util.function.Function. Later on, we will briefly explain the generation of a Spring Cloud Stream sink application from this consumer.

Read more...

Creating a Supplier Function and generating Spring Cloud Stream Source

This is part 3 of the blog series in which we are introducing java functions for Spring Cloud Stream applications.

Other parts in the series.

In the last two blogs in this series, we provided a general introduction to this new initiative of migrating all the existing Spring Cloud Stream App Starters to functions and the various ways in which we can compose them. In this blog, we continue the series, showing how these functions are developed, tested, and used to generate Spring Cloud Stream applications. In particular, here we are focusing on how to write a supplier function (implementing java.util.function.Supplier) and then generate the corresponding source application for Spring Cloud Stream.

Read more...

Stream Processing with Spring Cloud Stream and Apache Kafka Streams. Part 6 - State Stores and Interactive Queries

Part 1 - Programming Model
Part 2 - Programming Model Continued
Part 3 - Data deserialization and serialization
Part 4 - Error Handling
Part 5 - Application Customizations

In this part (the sixth and final one of this series), we are going to look into the ways Spring Cloud Stream Binder for Kafka Streams supports state stores and interactive queries in Kafka Streams.

Named State Stores

When you have the need to maintain state in the application, Kafka Streams lets you materialize that state information into a named state store. There are several operations in Kafka Streams that require it to keep track of the state such as count, aggregate, reduce, various windowing operations, and others. Kafka Streams uses a special database called RocksDB for maintaining this state store in most cases (unless you explicitly change the store type). By default, the same information in the state store is backed up to a changelog topic as well as within Kafka, for fault-tolerant reasons.

Read more...

Stream Processing with Spring Cloud Stream and Apache Kafka Streams. Part 5 - Application Customizations

Part 1 - Programming Model
Part 2 - Programming Model Continued
Part 3 - Data deserialization and serialization
Part 4 - Error Handling

In this blog post, we continue our discussion on the support for Kafka Streams in Spring Cloud Stream. We are going to elaborate on the ways in which you can customize a Kafka Streams application.

Customizing the StreamsBuilderFactoryBean

Kafka Streams binder uses the StreamsBuilderFactoryBean, provided by the Spring for Apache Kafka project, to build the StreamsBuilder object that is the foundation for a Kafka Streams application. This factory bean is a Spring lifecycle bean. Oftentimes, this factory bean must be customized before it is started, for various reasons. As described in the previous blog post on error handling, you need to customize the StreamsBuilderFactoryBean if you want to register a production exception handler. Let’s say that you have this producer exception handler:

Read more...