How to secure data on production environment
Question
Suppose you design a data lake with sensitive data. Due to immaturity of tools, dynamic data masking is unavailable. You have MFA, encryption at rest, audit logging, ETL processing data and people that develop/support/debug that ETL.
The standard answer is a separation of concerns between developer and support engineer (no access to sensitive data from developer, only support engineer).
But how to protect sensitive data from support engineer(s)?
Considerations
Support engineer need to debug issues, run jobs, etc. If ETL logic depends on sensistive data (i.e. join by IP address, or filter by medical status) this inherently mean support engineer has to have access to sensitive data.
Column-level encryption (besides a lot of burden) looks to not solve the issue: deterministic encryption isn't secure, moreover support engineer has to have indirect access to decryption keys in order to run the jobs.
Audit of queries also may not solve the issue, because query results may be downloaded.