OIDA Data Products | OIDA Resources

Deduplicated Spreadsheets

Introduction

OIDA contains hundreds of thousands of spreadsheets in native file formats. These spreadsheets cover a wide array of topics around the development, marketing, and sales of prescription opioids, such as:

Account management including client relationship management, lists of clients and networks
Clinical trials including data from clinical studies, lists of current and potential principal investigators
Corporate financials including financial statements with expenses, budgets, and payroll figures
Diversion including data from drug surveillance systems (e.g. RADARS and NAVIPPRO), cost of abuse and diversion for insurance companies
Drug utilization review including step therapy, Pharmacy Benefits Managers (PBMS), and prior authorizations
Human resources including talent management, training and development, and employee compensation
Market analysis including forecasting, surveys, market share research, and tracking across brands
Prescriptions (authorized) including data from the Transmucosal Immediate Released Fentanyl (TIRF) Risk Evaluation and Mitigation Strategy (REMS) program
Sales figures including weekly sales by territory, sales by provider, and drug prices
Sales visits including call notes from sales representatives, speaker programs with key opinion leaders, and sales territory targeting
Supply including inventory, shipping, and logistics related to supply chain.

The OIDA team has deduplicated spreadsheets from the hundreds of thousands of native format spreadsheet files across three of its largest collections (Insys, Mallinckrodt and McKinsey) to save time for users who wish to work with these files. SearchMyFiles, a program created by Nirsoft, was used to deduplicate files with .xls, .xlsx, .xlsm, and .csv extensions. (Other spreadsheet file extensions, such as .xlsb, were not included in this deduplication process.) The first occurrence of each duplicate spreadsheet was preserved, and the remainder deleted. Another "Duplicate Search" was performed to ensure no duplicates were found. Testing on a sample of spreadsheets was used to validate the robustness of the deduplication methods. The deduplication process reduced the spreadsheet count by an average of 45% across collections.

Download the data files

Insys: Download ZIP file (2.7 GB): insys_full_dedup.zip (documentation)
Mallinckrodt: Download ZIP file (61.2 GB): mallinckrodt_full_dedup.zip (documentation)
McKinsey: Download ZIP file (4.0 GB): mckinsey_full_dedup.zip (documentation)

How to cite this data product

UCSF-JHU Opioid Industry Documents Archive (2024). OIDA Deduplicated Spreadsheets. Available at https://doi.org/10.26144/r92h-dv82.