{"database": "pelican", "table": "content", "is_view": false, "human_description_en": "where author = \"ryan\" and \"published_date\" is on date 2020-03-28", "rows": [["ryan", "technology", "## The Problem\n\nData exchange in healthcare is ... harder than it needs to be. Not all\npartners in the healthcare arena understand and use technology to its fullest\nbenefit.\n\nTake for example several health plans which want data reported to them for CMS\n(Centers for Medicare and Medicaid Services) regulations. They will ask their\n'delegated' groups to fill out an excel file. As in, they expect you will\n_actually_ fill out an excel file, either by manually entering the data OR by\npotentially copying and pasting your data into their excel file.\n\nThey will also, quite frequently, change their mind on what they want AND the\norder in which they want the data to appear in their excel file. But there's\nno change log to tell you what (if anything has changed). All that you will\nget is an email which states, \"Here's the new template to be used for report\nXYZ\" ... even if this 'new' report is the same as the last one that was sent.\n\nSome solutions might be to use versioning software (like Git) but all they\nwill do is tell you that there is a difference, not _what_ the difference is.\nFor example, when looking at a simple excel file added to git and using `git\ndiff` you see:\n\n    \n    \n    diff --git a/Book3.xlsx b/Book3.xlsx\n    index 05a8b41..e96cdb5 100644\n    Binary files a/Book3.xlsx and b/Book3.xlsx differ\n    \n\nThis has been a giant pain in the butt for a while, but with the recent\nshelter-in-place directives, I have a bit more time on the weekends to solve\nthese kinds of problems.\n\n## The Solution\n\nWhy Python of Course!\n\nOnly two libraries are needed to make the comparison: (1) os, (2) pandas\n\nThe basic idea is to:\n\n  1. Load the files\n  2. use pandas to compare the files\n  3. write out the differences, if they exist\n\n### Load the Files\n\nThe code below loads the necessary libraries, and then loads the excel files\ninto 2 pandas dataframes. One thing that my team has to watch out for are tab\nnames that have leading spaces that aren't easy to see inside of excel. This\ncan cause all sorts of nightmares from a troubleshooting perspective.\n\n    \n    \n    import os\n    import pandas as pd\n    \n    file_original = os.path.join(\\\\path\\\\to\\\\original\\\\file, original_file.xlsx)\n    file_new = os.path.join(\\\\path\\\\to\\\\new\\\\file, new_file.xlsx)\n    \n    sheet_name_original = name_of_sheet_in_original_file\n    sheet_name_new = name_of_sheet_in_new_file\n    \n    df1 = pd.read_excel(file_original, sheet_name_original)\n    df2 = pd.read_excel(file_new, sheet_name_new)\n    \n\n### Use Pandas to compare\n\nThis is just a one liner, but is super powerful. Pandas DataFrames have a\nmethod to see if two frames are the same. So easy!\n\n    \n    \n    data_frame_same = df1.equals(df2)\n    \n\n### Write out the differences if they exist:\n\nFirst we specify where we're going to write out the differences to. We use\n`w+` because we'll be writing out to a file AND potentially appending,\ndepending on differences that are found. The `f.truncate(0)` will clear out\nthe file so that we get just the differences on this run. If we don't do this\nthen we'll just append to the file over and over again ... and that can get\nconfusing.\n\n    \n    \n    f.open(\\\\path\\\\to\\\\file\\\\to\\\\write\\\\differences.txt, 'w+')\n    f.truncate(0)\n    \n\nNext, we check to see if there are any differences and if they are, we write a\nsimple message to our text file from above:\n\n    \n    \n    if data_frame_same:\n        f.write('No differences detected')\n    \n\nIf differences are found, then we loop through the lines of the file, finding\nthe differences and and writing them to our file:\n\n    \n    \n    else:\n        f.write('*** WARNING *** Differences Found\\n\\n')\n        for c in range(max(len(df1.columns), len(df2.columns))):\n            try:\n                header1 = df1.columns[c].strip().lower().replace('\\n', '')\n                header2 = df2.columns[c].strip().lower().replace('\\n', '')\n                if header1 == header2:\n                    f.write(f'Headers are the same: {header1}\\n')\n                else:\n                    f.write(f'Difference Found: {header1} -> {header2}\\n')\n            except:\n                pass\n    \n    f.close()\n    \n\nThe code above finds the largest column header list (the file may have had a\nnew column added) and uses a `try/except` to let us get the max of that to\nloop over.\n\nNext, we check for differences between `header1` and `header2`. If they are\nthe same, we just write that out, if they aren't, we indicate that `header1`\nwas transformed to `header2`\n\nA sample of the output when the column headers have changed is below:\n\n    \n    \n    *** WARNING *** Differences Found\n    \n    Headers are the same: beneficiary first name\n    ...\n    Difference Found: person who made the request -> who made the request?\n    ...\n    \n\n## Future Enhancements\n\nIn just using it a couple of times I've already spotted a couple of spots for\nenhancements:\n\n  1. Use `input` to allow the user to enter the names/locations of the files\n  2. Read the tab names and allow user to select from command line\n\n## Conclusion\n\nI'm looking forward to implementing the enhancements mentioned above to make\nthis even more user friendly. In the mean time, it'll get the job done and\nallow someone on my team to work on something more interesting then comparing\nexcel files to try (and hopefully find) differences.\n\n", "2020-03-28", "using-python-to-check-for-file-changes-in-excel", "## The Problem\n\nData exchange in healthcare is ... harder than it needs to be. Not all\npartners in the healthcare arena understand and use technology to its fullest\nbenefit.\n\nTake for example several health plans which want data reported to them for CMS\n(Centers for Medicare and Medicaid Services) regulations. They \u2026\n\n", "Using Python to Check for File Changes in Excel", "https://www.ryancheley.com/2020/03/28/using-python-to-check-for-file-changes-in-excel/"]], "truncated": false, "filtered_table_rows_count": 1, "expanded_columns": [], "expandable_columns": [], "columns": ["author", "category", "content", "published_date", "slug", "summary", "title", "url"], "primary_keys": ["slug"], "units": {}, "query": {"sql": "select author, category, content, published_date, slug, summary, title, url from content where \"author\" = :p0 and date(\"published_date\") = :p1 order by slug limit 101", "params": {"p0": "ryan", "p1": "2020-03-28"}}, "facet_results": {}, "suggested_facets": [{"name": "published_date", "type": "date", "toggle_url": "http://search.ryancheley.com/pelican/content.json?author=ryan&published_date__date=2020-03-28&_facet_date=published_date"}], "next": null, "next_url": null, "private": false, "allow_execute_sql": true, "query_ms": 13.952563516795635}