#SQL
2 posts

Modern data practice and the SQL tradition

Create more features at the query level to gain flexibility with different feature vectors, so that model selection and evaluation are quicker.
Read more

Modern data practice and the SQL tradition

  • Beware the schemaless nature of NoSQL systems, which can easily lead to sloppy data modeling at the outset. Start with an RDBMS in the first place, preferably with a JSON data type and indices on expressions, so you can have a single database for both structured and unstructured data and maintain ACID compliance.
  • Bring ETL closer to the data and be wary of decentralized data cleaning transformation. Push data cleaning to the database level wherever possible - use type definitions, set a timestamp with timezone policy to enable ‘fail fast, fairly early’, use modern data types such as date algebra or geo algebra instead of leaving that for Pandas and Lambda functions, employ triggers and stored procedures.
  • Create more features at the query level to gain flexibility with different feature vectors, so that model selection and evaluation are quicker.
  • Distributed systems like MongoDB and ElasticSearch can be money-hungry (both in terms of technology and human resources), and deployment is harder to get right with NoSQL databases. Relational databases are cheaper, especially for transactional and read-heavy data, more stable and perform better out of the box.
  • Be very meticulous as debugging is quite difficult for SQL, given its declarative nature. Also, be mindful of clean code and maintainability.

Full post here, 13 mins read

Making fast APIs: lessons learned from 40 years of SQL

Give consumers full access over what to fetch, and don’t tie them to pre-determined data fields. Emulate SQL’s EXPLAIN method & let users know how exactly the database will execute their query.
Read more

Making fast APIs: lessons learned from 40 years of SQL

  • Give consumers full access over what to fetch, and don’t tie them to pre-determined data fields.
  • Emulate SQL’s EXPLAIN method & let users know how exactly the database will execute their query. Then they can see what may be slowing things down and correct it themselves.
  • If impossible to offer full access, optimize your API for common access patterns.
  • Make it easy for users to fetch all the data they need in one go, rather than looping multiple requests.
  • Design the API to cache data locally.
  • Design the API to study access logs and prefetch relevant data accordingly as this impacts the perceived speed of fetching without changing throughput.
  • Learnings from the NoSQL movement: model data structure on data access patterns; and that users want consistency more than the fastest (or the slowest) speeds possible.

Full post here, 10 mins read