Deduplicated Spreadsheets

Introduction

OIDA contains hundreds of thousands of spreadsheets in native file formats. These spreadsheets cover a wide array of topics around the development, marketing, and sales of prescription opioids, such as:

  • Account management including client relationship management, lists of clients and networks
  • Clinical trials including data from clinical studies, lists of current and potential principal investigators
  • Corporate financials including financial statements with expenses, budgets, and payroll figures
  • Diversion including data from drug surveillance systems (e.g. RADARS and NAVIPPRO), cost of abuse and diversion for insurance companies
  • Drug utilization review including step therapy, Pharmacy Benefits Managers (PBMS), and prior authorizations
  • Human resources including talent management, training and development, and employee compensation
  • Market analysis including forecasting, surveys, market share research, and tracking across brands
  • Prescriptions (authorized) including data from the Transmucosal Immediate Released Fentanyl (TIRF) Risk Evaluation and Mitigation Strategy (REMS) program
  • Sales figures including weekly sales by territory, sales by provider, and drug prices
  • Sales visits including call notes from sales representatives, speaker programs with key opinion leaders, and sales territory targeting
  • Supply including inventory, shipping, and logistics related to supply chain.

The OIDA team has deduplicated spreadsheets from the hundreds of thousands of native format spreadsheet files across three of its largest collections (Insys, Mallinckrodt and McKinsey) to save time for users who wish to work with these files. SearchMyFiles, a program created by Nirsoft, was used to deduplicate files with .xls, .xlsx, .xlsm, and .csv extensions. (Other spreadsheet file extensions, such as .xlsb, were not included in this deduplication process.) The first occurrence of each duplicate spreadsheet was preserved, and the remainder deleted. Another "Duplicate Search" was performed to ensure no duplicates were found. Testing on a sample of spreadsheets was used to validate the robustness of the deduplication methods. The deduplication process reduced the spreadsheet count by an average of 45% across collections.

Download the data files