close

Arjen Poutsma

Arjen Poutsma

Spring Technical Advisor

Rotterdam, the Netherlands

Blog Posts by Arjen Poutsma

Efficient Parsing of Reactive Buffer Streams

It has been a while since Spring Framework 5.3 was released. One of the features in that release was a major overhaul of our Reactive Multipart support. In this blog post, we share some of the knowledge learned while working on this feature. Specifically, we focus on finding a token within a stream of byte buffers.

Multipart Form Data

Whenever you upload a file, your browser sends it — and other fields in the form — to the server as a multipart/form-data message. The exact format of these messages is described in RFC 7578. If you submit a simple form with a single text field called foo and a file selector called file, the multipart/form-data message looks something like this:

POST / HTTP/1.1
Host: example.com
Content-Type: multipart/form-data;boundary="boundary" (1)

--boundary (2)
Content-Disposition: form-data; name="foo" (3)

bar
--boundary (4)
Content-Disposition: form-data; name="file"; filename="lorum.txt" (5)
Content-Type: text/plain

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer iaculis metus id vestibulum nullam.

--boundary-- (6)
  1. The Content-Type header of the message contains the boundary parameter.

  2. The boundary is used to start the first part. It is preceded by --.

  3. The first part contains the value of the text field, foo, as can be seen in the part headers. The value of the field is bar.

  4. The boundary is used to separate between the first and second part. Again, it is preceded by --.

  5. The second part contains the contents of the submitted file, named lorum.txt.

  6. The end of the message is indicated by the boundary. It is preceded and followed by --.

Finding the Boundaries

The boundary in a multipart/form-data message is quite important. It is specified as a parameter of the Content-Type header. When preceded by two hyphens (--), the boundary indicates the beginning of a new part. When also followed by --, the boundary indicates the end of the message.

Finding the boundary in the stream of incoming byte buffers is key when parsing multipart messages. Doing so seems simple enough:

private int indexOf(DataBuffer source, byte[] target) {
  int max = source.readableByteCount() - target.length + 1;
  for (int i = 0; i < max; i++) {
    boolean found = true;
    for (int j = 0; j < target.length; j++) {
      if (source.getByte(i + j) != target[j]) {
        found = false;
        break;
      }
    }
    if (found) {
      return i;
    }
  }
  return -1;
}

However, there is a complication:The boundary can be split across two buffers, which — in a Reactive environment — might not arrive at the same time. For example, given the sample multipart message shown earlier, the first buffer might contain the following:

POST / HTTP/1.1
Host: example.com
Content-Type: multipart/form-data;boundary="boundary"

--boundary
Content-Disposition: form-data; name="foo"

bar
--bou

While the next buffer contains the remainder:

ndary
Content-Disposition: form-data; name="file"; filename="lorum.txt"
Content-Type: text/plain

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer iaculis metus id vestibulum nullam.

--boundary--

If we inspect one buffer at the time, we can not find split boundaries like these. Instead, we need to find the boundary across multiple buffers.

One way to solve this problem would be to wait until all buffers have been received, join them, and locate the boundaries afterwards. The following example does so, using a sample stream and the indexOf method defined earlier:

Flux<DataBuffer> stream = Flux.just("foo", "bar", "--boun", "dary", "baz")
  .map(s -> factory.wrap(s.getBytes(UTF_8)));
byte[] boundary = "--boundary".getBytes(UTF_8);

Mono<Integer> result = DataBufferUtils.join(stream)
  .map(joined -> indexOf(joined, boundary));

StepVerifier.create(result)
  .expectNext(6)
  .verifyComplete();

Using Reactor’s StepVerifier, we see that the boundary starts at index 6.

There is one major downside to this approach: joining multiple buffers into one effectively stores the entire multipart message in memory. Multipart messages are primarily used to upload (large) files, so this is not a viable option. Instead, we need a smarter way to locate the boundary.

Knuth to the Rescue!

Luckily, such a way exists in the form of the Knuth–Morris–Pratt algorithm. The main idea behind this algorithm is that if we already matched several bytes of the boundary but the next byte does not match, we do not need to restart the from the beginning. To do so, the algorithm maintains state, in the form of a position in a precomputed table that contains the number of bytes that can be skipped after a mismatch.

In Spring Framework, we have implemented the Knuth-Morris-Pratt algorithm in the Matcher interface, which you can obtain an instance of through DataBufferUtils::matcher. You can also check out the source code.

Here, we use the Matcher to give us the end indices of boundary in stream, using the same sample input as earlier:

Flux<DataBuffer> stream = Flux.just("foo", "bar", "--boun", "dary", "baz")
  .map(s -> factory.wrap(s.getBytes(UTF_8)));
byte[] boundary = "--boundary".getBytes(UTF_8);

DataBufferUtils.Matcher matcher = DataBufferUtils.matcher(boundary);
Flux<Integer> result = stream.map(matcher::match);

StepVerifier.create(result)
  .expectNext(-1)
  .expectNext(-1)
  .expectNext(-1)
  .expectNext(3)
  .expectNext(-1)
  .verifyComplete();

Note that the Knuth-Morris-Pratt algorithm gives the end index of the boundary, which explains the test results: the boundary does not end until index 3 in the second-to-last buffer.

As can be expected, Spring Framework’s MultipartParser makes heavy use of Matcher, for

If you need to find a series of bytes in a stream of byte buffers, give the Matcher a try!

Read more...

New in Spring 5.3: Improved Cron Expressions

If you regularly listen to A Bootiful Podcast, you might have heard about the improvements we made to Spring Framework’s cron support. Cron expressions are mostly used in Spring applications through the @Scheduled annotation. In Spring 5.3, we introduced the CronExpression class, which represents — you guessed it — a cron expression.

CronExpression replaces CronSequenceGenerator, which is based on java.util.Calendar and which has several known issues that none of the Spring team members felt comfortable solving. Introducing a new type allowed us to use the superior java.time APIs, solve the outstanding issues, and (hopefully) introduce new features as well. While Spring generally prefers to maintain backward compatible, sometimes we do believe that starting from scratch is the best option.

Read more...

New in Spring 5: Functional Web Framework

As mentioned yesterday in Juergen’s blog post, the second milestone of Spring Framework 5.0 introduced a new functional web framework. In this post, I will give more information about the framework.

Keep in mind the functional web framework is built on the same reactive foundation that we provided in M1 and on which we also support annotation-based (i.e. @Controller, @RequestMapping) request handling, see the M1 blog post for more on that.

Example

We start with some excerpts from our sample application. Below is a reactive repository that exposes Person objects. It is quite similar to a traditional, non-reactive repository, except that it returns Flux<Person> where you would return a List<Person> traditionally, and Mono<Person> where you would return a Person. Mono<Void> is used as a completion signal: to indicate when the save has been completed. For more information on these Reactor types, refer to Dave’s blog post.

Read more...

Spring Web Services 2.2.0 Released

I’m pleased to announce that Spring Web Services 2.2.0.RELEASE has been released! This is the first release in the 2.2 release cycle. The main new feature in 2.2 is the introduction of code configuration support for Spring-WS. This means that you can now configure Spring-WS with a simple @EnableWs annotation. For instance:

@Configuration
@EnableWs
@ComponentScan(basePackageClasses = { MyConfiguration.class })
public class MyWsConfiguration {

  // @Beans go here
}

For more information about this topic, refer to the javadoc of @EnableWs. You can also read more about this new feature in the updated reference documentation. To view a complete list of changes see the changelog.

Read more...

Introducing Spring Scala

Last October, at SpringOne2GX, I introduced the Spring Scala project to the world. Since then, I’ve also presented this project at Devoxx. In this blog post, I would like to give further details about this project and how you can use it in your Scala projects.

Why Spring Scala?

The goal of the Spring Scala project is simply to make it easier to use the Spring framework in Scala. We believe that there are many Spring users out there who want to try Scala out, but do not want to leave their experience with Spring behind. This project is meant for those people.

Read more...

Spring Web Services 2.0 Released

After being in the works for almost a year, I’m happy to announce that Spring Web Services 2.0 has been released! In this post, I’d like to go over some of the major new features.

Java 5+ and Spring 3.0 Required

As you are probably aware, we moved the Object XML Mapping (OXM) module from the Spring-WS project into Spring 3.0. As such, it was a bit problematic to use Spring-WS 1.5 (with its own OXM module) with Spring 3.0, due to conflicting classes in the org.springframework.oxm package.

As of version 2.0, we no longer ship the OXM module as part of Spring-WS, but depend on Spring’s OXM instead. As a result, Spring Web Services 2.0 requires Spring 3.0 to work. Normally, we tend to be a bit more lenient with regard to version requirements, not necessarily requiring the latest Spring version, but this was the only way to make things work.

Read more...

REST in Spring 3: RestTemplate

In an earlier post, I blogged about the REST capabilities we added to Spring @MVC version 3.0. Later, Alef wrote about using the introduced functionality to add an Atom view to the Pet Clinic application. In this post, I would like to introduce the client-side capabilities we added in Milestone 2.

RestTemplate

The RestTemplate is the central Spring class for client-side HTTP access. Conceptually, it is very similar to the JdbcTemplate, JmsTemplate, and the various other templates found in the Spring Framework and other portfolio projects. This means, for instance, that the RestTemplate is thread-safe once constructed, and that you can use callbacks to customize its operations.

Read more...

REST in Spring 3: @MVC

In the last couple of years, REST has emerged as a compelling alternative to SOAP/WSDL/WS-*-based distributed architectures. So when we started to plan our work on the next major release of Spring - version 3.0, it was quite clear to us that we had to focus on making the development of ‘RESTful’ Web services and applications easier. Now, what is and isn’t ‘RESTful’ could be the topic of a whole new post all together; in this post I’ll take a more practical approach, and focus on the features that we added to the @Controller model of Spring MVC.

Read more...

Spring Web Services 1.5.1 Released

Dear Spring community,

I’m pleased to announce that Spring Web Services 1.5.1 has been released!

Downloads | Site | Changelog | Announcement


This is the first bug fix and enhancement release in the Spring-WS 1.5 series. It fixes all bugs reported since 1.5.0 and introduces various enhancements throughout the framework:

  • Introduced a Spring JMS MessageConverter that uses OXM marshallers
  • Introduced a Spring MVC View that uses OXM marshallers
  • Fixed WS-Security signatures when using WSS4J in combination with SAAJ messages
  • Support for timeouts on HTTP transports
  • Support for Castor 1.2, see note below
  • Airline sample now uses Spring Security
Read more...

What's New in Spring Web Services 1.5?

After being in the works for about six months, I’m happy to announce that Spring Web Services 1.5.0 has been released! In this post, I’d like to go over some of the major new features.

New Transports

The 1.5 release includes two new transports: JMS and email. Using these new transports requires no Java code changes: just add a bit of configuration, and you’re off! The JMS transport integrates nicely with Spring 2’s Message-Driven POJO model, as indicated by the following piece of configuration taken from the airline sample application:

Read more...