Non-Official intro to Kafka, Confluent and Data Streams

Introduction

Kafka is a damn beautiful thing, I must admit…

It makes me all googly eyed and leaves that sense of winder in me… but yeah, i’m strange like that!

I’ve been working through it for a little while now and learn some stuff. I also was met with a few challenges and wanted to share these with you.

Let’s take a quick walk through it.

What is Kafka?

The web definition:

Kafka is a distributed streaming platform that has 3 main capabilities:

Publish and Subscribe to streams
Store messages into streams
Process messages before we store

That last point is what makes Kafka a beast imo. Why?

Let’s look at some pictures to understand that.

Keeping it old school

Simplified dramatically, some event driven archs may appear to look like this: https://woki.orionhealth.global/plugins/gliffy/viewer.action?wmode=opaque

While this worked for many years, we found limitations in processing performance and handling data in near real time but nothing I guess prepared our world for the

creation of the big data concept.

Storing data is always costly and usually the bulk of your software budget. To make thinks faster in this model, devs

had to either scale up the number of Db instances they had of use a multiple Db instances. This is a sore sore SORE point for member of our tech community, big pockets or small.

This entry point also made our applications the centre of attention and building our applications more complex.

Multi-threading and bad message handling was super key to ensure this was successful.

How do we manage the load?!

However it came at a cost.

it was challenging for devs
it made dev and testing complex
it bloated our codebases
it wasn’t always done right
issues were usually only spotted once these threads ran live and encountered the real world for the first time. We’re were now in the thick of things…

This gave rise to the idea of data streaming, which rise to Kafka!

Kafka and Data streams for the win

Yes, my eyes are still googlied <- legit

I love high performance applications. I mean, we all can build a hello world project,

but to build a highly performant, fault tolerant scalable and real time application that can process a million messages quickly

takes something hell of a special. (quick shout out to Reece and the SDP Platforms team (wink) )

KAFKA is purpose built for this very need. https://woki.orionhealth.global/plugins/gliffy/viewer.action?wmode=opaque

Kafka Terms and Concepts

BROKER

A Kafka broker is just a running instance of kafka.

Its the hub, the mother ship the ring leader, the conductor, the centre piece of the show the thriller in Manila…ok, too far but you get my vibe.

TOPIC

Brokers are were we create topics for consuming.

A topic is really just a layer of business logic events. for eg, if we were in the business of baking bread,

a topic of interest would be:

“how long has the dough rested”
“is the oven hot enough to begin”
“how long has the bun been in the oven”

PRODUCER

A producer is one that will create or add to a topic. When an event is raised, the producer springs into action.

So in our case above, the oven would produce an event to the topics:

“is the oven hot enough to begin”
“how long has the bun been in the oven”

But what about the other topic?

The oven has no need to care about how long the bread has rested. It just cares if it itself is ready and if it has completed the job.

This then means the baker would be the only producer to the other topic

“how long has the dough rested”

CONSUMER

A consumer of a topic is one that is simply subscribes to a topic and is updated when something in that topic changes.

In our bread winner example above, our baker or bakers would the consumers of all these topics…well, we hope for bread sakes!

The Kafka value

So far I’ve explained some differences and some terms, but not the value.

As seen in the Kafka image above, we are now able to do a couple of things BEFORE our application and databases are hit.

BOOM! VALUE!

We now can:

filed multiple threads
compute at mad speeds
and store only when necessary
doing all this at near real time, almost seamlessly at lower operational costs.

MORE FOR LESS -> BOOM! EVEN MORE VALUE!

The real 11 herbs and spices are in the manner in which the data is sharded, partitioned, stored, processed and read. But that a technical walk though for another day, but dang

Its like the Oprah of big data processing eh?..You feeling my love for this now?

Confluent

With all nice things in this world, people cant wait to just onboard. thankfully so!

Working with streams and stream applications, especially from a QA point of view, can be challenging, merely on the fact that

there is a very, very small set of tools for us to use to validate and gain some peek preview of data under the hood.

The team of confluent have done a job of this, check them out here:

https://www.confluent.io/

Accessing all things streams

I have come across 3 very handy tools that do the similar thing but can be used in vary different applications and ways.

KSQL

A neat little library that lets us query a stream as if it were a database.

Pretty handy when you want to go above and beyond being a consumer of a topic.

https://www.confluent.io/product/ksql/

Kafka Control Centre

The control centre is UI designed to hook into your broker.

It will give you all the goodness of KSQL and a streams dashboard. Pretty neat.

https://docs.confluent.io/current/control-center/index.html#

Kafka Rest Proxy

https://docs.confluent.io/current/kafka-rest/index.html

Good as gold – well for me anyways!

I guess being in QA for so long has really trained my brain to push past what even i as a developer would deem normal behaviour.

i.e. just testing the application as a consumer is not enough.

The REST client is awesome and lets me integrate my integration testing platform along side my dev team and application code.

As we produce topics and messages, I get to peer into these, even before consuming it.

I can validate

the producer
the message
the topic
the partitions
the success
and the failures
speed at which we are processing and latency between them. STUNNING!

Conclusion

I hope this has enlightened you a little on Kafka, streams and some tools you can use to simply your life.

We actually live in a world a streaming data…

NETFLIX

YOUTUBE

SPOTIFY

Your favourite ONLINE RETAIL store… you know those suggestions on “OTHER ITEMS YOU MIGHT LIKE”….. well…stream data…It’s all around you and I.

The introduction of tooling over Kafka is epic.

As a QA, this was imperative. I need to follow the data from start to end, leaving no bread unbaked.

It was not a simple undertaking for me.

I researched and I learnt a lot, tried loads of things and failed, leaned on people with experience to help me through and eventually found success.

All of these come neatly wrapped in docker containers. These can be run as a whole or as components. It makes working with queues a lot simpler by giving us hooks into our streams.

Dont let a subject like big data processing scare you. Get in there, get your hands dirty and fail! Fail and fail till you bake your bread.

Manage your software delivery like a boss

Introduction

Over the past then years that I’ve been in the industry, I have been fortunate enough to bear witness to the beautiful way software delivery methods have evolved.

From long lasting design of monoliths to quick and in some cases, daily deploys of micro-services, we have come a long long way but our journey is not done.

Today, we’re moving faster than ever into cloud, serverless, containers and SaaS.

The Revolution

The key to this revolution was communication. There is indeed, strength in numbers.

Knowing that we were not struggling alone. That our issues were not unique and that others solved them was comforting, motivating and enlightened us.

The Bitter Truth

We all became students and masters at the same time. We read, research, learn insatiably. GIVE ME ALL THE KNOWLEDGES!

Learning how to do things better,

Learning how others found success in similar situations! YAS!

We’re amped! We’re pumped, we try it out and its going to be amazing…and it fails!

Its hard…its not working…its taking longer than expected.

wait.. what?!

What just happened

Variables. Life just happened. It sucks, but its a lesson that was always yours…waiting for just you.

Project Risk

As a lead, take careful consideration of your projects strength and weaknesses.

These Risks will surely impact your delivery.

Focus on quality early

With the introduction of micro-services, we have seen testing get pushed left.

Like into the bush and we forget about it? Erhm…NO.

By pushing testing left, we attempt to push the top tiers of the pyramid down.

So an example here would be, refining our system tests to pose as integration tests instead.

The goal of this concept has always been the same, faster feedback.

TIP: Dont get bogged down by the concept but rather focus on the goal, shift can happen in both directions.

Have a repeatable game plan

Build a vision for delivery with your team.

Understand each move your people and software makes and draw a picture.

Create flow diagrams and maintain this as you and the team grows…making it your own and not an internet copy and paste.

Most of the time, everything you read will be wrong for you. OMG…

I just mean that when we read something, most of the time we dont have the full picture. The new start in your team or the exit of an old soul will dramatically change how you deliver. Shift and adapt with it.

TIP: Dont forget to address your delivery process often!

Below is an example and a good head start for you to build and share your flow. Make it your own!

Conclusion

As you forge forward in your ambitions to delivery high quality software sooner, build a plan that is lean but high value to your team.

Use tools wisely,
Use other’s learnings and suggestions sparingly

How to reuse/re-purpose existing steps?

successbusinessconceptpe_574977-630x330

Reuse/re-purpose existing steps in your code by using the pipe delimiter.

@Then(“^(I search for and select an existing user|I validate a user has been created)$”)

– Use this sparingly and make your BDD’s more readable.

Working with spring properties and profiles

sptingbootMaven

Today I have a very simple application for you to demonstrate how to use profiles to control environmental variables.

To skip the build and access the code, you can find it here:

https://github.com/suveerprithipal/spring_properties_and_profiles

Lets keep going.

Why do we want this?

So we can swap profiles when testing locally vs a cloud or dev vs test

How do we build this?

Firstly, we need to create an environmental variable. To do so, on windows10,

On windows desktop, Search for “advanced system settings”

Windows_Search_Advanced_system_settings

Select ‘Advanced system settings’

Advanced_system_settings

Click Environment Variable
Click “New” under ‘System Variables’
Set Variable name of ‘APP_URL’
Set Variable value of ‘cloud’

add_system_var

Next up I created a Maven project and added in my Spring boot dependencies. We then need to create some profiles. Easy peasy. To do so, we actually create properties files. Yep, that simple. Under your java/test filepath, create a resources dir. Remember to right click and mark as ‘test Resources Root’ Create 2 properties files:

properties-local.properites
properties-cloud.properties

adding_properites_files

In each of them, add a variable called ‘my.app.url’ as the following demonstrates.

in properties-local.properites: my.app.url = local

local_props
in properties-cloud.properites: my.app.url = ${app_url}

cloud_props

We use “${app_url}” here as we want to point our test to use the system variable we created. When springboot runs, and see’s “${}” it knows to go looking for a definition on that variable on the system.

We’re almost done! and now in a position to add some tests to access these variables we’ve created above.. Create a test or use the one created by default if you have one.

In our test class, create a class member of type ‘Environment'(org.springframework.core.env.Environment). I called mine “env”. Don’t forget to Autowire(org.springframework.beans.factory.annotation.Autowired) this member as we want a bean created for this at run-time.

We create 2 tests:

localTest
cloudTest

testclass

These are very similar in nature, but have 2 different asserts:

localTest Assert.assertEquals(“local”, env.getProperty(“my.app.url”));
cloudTest Assert.assertEquals(“cloud”, env.getProperty(“my.app.url”));

Lastly, we want to specify which profile spring should use when running. To do this, we will add a VM Option to the Run Configuration.

I ran my 2 tests above individually. This create a run configuration for me automatically. I then go to edit these configs.

localTest Run config: I set the VM option to : -Dspring.profiles.active=local Apply and save.
cloudTest Run config: I set the VM option to : -Dspring.profiles.active=cloud Apply and save.

runconfig

Now when we run these, spring will swap out our properties file, based on the profile we selected. You will see that the local env.getProperty will return “local” and cloud env.getProperty will return cloud, which we set on the system.

That’s it! You’re done! Happy Testing.

Functional testing: Numbers vs Coverage

team-success-1600x700

Functional test automation. It’s not entirely about the numbers, but rather the value and coverage of the test you create.

Push left. Cover more in unit/integration tests and less on the UI. Try not to look at this as it takes away from you as a tester’ but rather enhances your ability, exposure and experience.

Its difficult to decide which tests to push left you say?

Yes, I can definitely agree with that. I was hard for me too.

I’ve learnt that this difficulty comes strong at first but when this is a practice we bake into our planning sessions, this becomes easier with time. Communication is the most important influence here. The results of which will lend to a “Team Test Approach” or “Quality mindful team”.

Are you BDDing?

There are many ways in which we can scribe out a feature file. In my experience with teams of various experience, businesses of different maturity and a variety of complexity of applications I have learned there is no 1 way to get this right.

Here are 7 things to remember:

Don’t just use the tool but use the tool well. Stick to the reasons why we use a ‘Given’, ‘When’ and ‘Then’
Remove test data setup from your scenarios. If you need to ready your env for test, treat it as a separate concern.
Tags, tag everything. This is a great way to ensure we run specific tests and deliver faster feedback when you are faced with a time sensitive roll-out.
A short scenario is a good scenario.
You scenario’s aren’t cast in stone. Revise and revisit.
Maturing your BDD is really important. Its living documentation and is ultimately a reflection on our understanding of the application and our drive for high quality applications.
Share it with your team. Make your team quality conscious. Have them revise it at regular intervals…make it a thing.

Doctor: “we need a WebDriver stat!”

To spin up a WebDriver is pretty easy. We really only need 4 lines of code:

System.setProperty(“webdriver.chrome.driver”,”…”);

WebDriver driver = new ChromeDriver();

driver.manage().window().maximize();

driver.get(“http://…”);

Manual Testing API’s?

download

For manual testing, Postman is a great resource for API testing. Validate your results post call using their inbuilt test functionality.Here is a link to some test examples.

https://lnkd.in/g5qnF48