Tips for architecting fast data applications

  • Understand requirements in detail: how large each message is, how many messages are expected per minute, whether there may be large changes in frequency, whether records can be batch-processed, whether time relationships and ordering need to be preserved, how ‘dirty’ the data may be and does the dirt need to be cleaned, reported or ignored, etc.
  • Implement an efficient messaging backbone for reliable, secure data exchange with low latency. Apache Kafka is a good option for this.
  • Leverage your SQL knowledge, applying the same relational algebra to data streams in time-varying relations.
  • Deploy cluster managers or cluster management solutions for greater scalability, agility, and resilience.

Full post here, 7 mins read