Rename PDF files in bulk

  • Thread starter Deleted member 356896
  • Start date
D

Deleted member 356896

Here's my scenario. I've scanned hundreds of receipts using a ADF scanner I've recently purchased. They're all sat in a folder with a sequential file number given to it by the scanning software. The files have been OCR scanned so the text is searchable in the document.

I'm looking for ways to rename the files quickly rather than go through them one by one (i.e. open each file, check the content, rename the file).

Ideally the naming structure would be something like this YYYY-MM-DD SUPPLIER AMOUNT. So a receipt for today from Argos for £30 would be "2024-05-26 Argos £30".

Maybe a big ask and happy to compromise on the logic used to renaming the file. even just the date 2024-05-26.

The scanning software has dated the files but it's date of scan rather than date of receipt.

Does anyone know of an automated system or something that is still manual but would make this renaming process easier.

Any help appreciated.
 

fisicx

Moderator
Sep 12, 2006
46,673
8
15,372
Aldershot
www.aerin.co.uk
There are plenty of scripts that will extract text from a pdf. To then get the correct data would need a second script - but only really if the invoices are all structured identically. And then a final script to rename the file.

So it is possible. But not sure it’s easy.

I’d use a freelancer site to get the job done. Let then sort out the best method, which might be doing it manually.
 
  • Like
Reactions: LondonFooser
Upvote 0

WaveJumper

Free Member
  • Business Listing
    Aug 26, 2013
    6,620
    2
    2,396
    Essex
    I have a strong feeling as mentioned above you could be running extract tools and getting a whole host of mixed data, having in the past volunteered to sort through a whole host of monthly invoices for my son (once and never again) just trying to manual decipher some of the invoices was a real eye opener. So I have a feeling you may well have to "scrub" through the data quite a few time and then manually view all your results to be confident its correct.

    Finding a freelancer as @fisicx suggested is an excellent idea, and for anyone else reading this thread keep your accounts up to date daily / weekly has to be a good idea, plenty of AI out there which helps with that these days
     
    Upvote 0
    D

    Deleted member 356896

    Sounds like a manual task then.

    Anyone recommend a pdf browser that lets you easily review and rename files. I'm currently using Adobe.

    Process is find file in file explorer > open file in Adobe > review > close file > go back to file explorer > rename file

    If there was a tool that allowed you to view and rename easily that would be a good compromise. I've got over a 1,000 documents to do :(
     
    Upvote 0
    D

    Deleted member 356896

    • Like
    Reactions: fisicx
    Upvote 0

    Newchodge

    Moderator
  • Business Listing
    Nov 8, 2012
    22,639
    8
    7,949
    Newcastle
    Think I've cracked it.

    I've enabled thumbnail preview in windows explorer for Adobe Acrobat.


    In File explorer I've clicked on view preview pane. On the left I have a list of the files. On the right I have an image of the PDF. If I press F2 that lets me quickly edit and rename the file.
    Before you start may I suggest you give some thought to the structire of the file name, and whether the files need to be separated into folders for different categories. The more time you spend deciding these things, the less likely it is you will need to spend time making changes later. So, for example, 2024-05-25 is 2 more key strokes than 20240525, but you might prefer 22024 05 25. If you mix up those 3 formats you wuill not get the same sequential order.
     
    Upvote 0
    D

    Deleted member 356896

    Before you start may I suggest you give some thought to the structire of the file name, and whether the files need to be separated into folders for different categories. The more time you spend deciding these things, the less likely it is you will need to spend time making changes later. So, for example, 2024-05-25 is 2 more key strokes than 20240525, but you might prefer 22024 05 25. If you mix up those 3 formats you wuill not get the same sequential order.

    Thanks. I've got a good folder structure in place. Main thing I want in the file name is the date. I normally prefer YYYY first followed by month and day so that when sorting by file name they show in sequence. I can't rely on the created/amended file date property.

    The rest of the file name is a bonus and I probably won't complete that due to the time involved. The majority of my PDF's are already OCR and I'll do a sweep to convert any that are not (I get some sent to me by staff and they don't always have OCR switched on the scanning software).

    I already use some freeware to search for any documents when I need something. The software searches for text within the document. I'll then sort the results by file name which will let me see the most recent documents that match the search criteria.

    I've got most of my documents digitised and the system I have works really well. This documentation is all the old paperwork before I put in my system place. I'm looking to clear some shelf space so going to scan it all.
     
    Upvote 0

    Frans VH

    Free Member
  • Dec 19, 2012
    68
    21
    Near Brussels
    Just for fun, I uploaded two invoices in chatgpt and asked: "Please extract invoice date, invoice amount and supplier name from thes invoices"
    It got it correct in this minimal dataset. It even flagged the invoice date on invoice 2 which wasn't really an invoice but a booking confirmation. It had a print date in the footer so it wasn't sure it could be considered as an invoice date.

    Here are the extracted details from the provided invoices:

    Invoice 1 (invoice1.pdf):​

    • Invoice Date: 06/12/2023
    • Invoice Amount: €32.04
    • Supplier Name: Telenet BV

    Invoice 2 (invoice2.pdf):​

    • Invoice Date: 12/10/2023 (for confirmation)
    • Invoice Amount: €97.00
    • Supplier Name: Interparking


    No idea how it would perform on hundreds of invoices. If it would work, creating a script to automate this would be doable.
     
    Upvote 0

    Frans VH

    Free Member
  • Dec 19, 2012
    68
    21
    Near Brussels
    I think you would need a script that loops over all files in your input folder and then for each file
    1. asks chatgpt to identify supplier, invoice date and amount
    2. use that info to rename the file and move it to an output folder
    3. log the result of the scan (one line / file) for quick validation
    That is not so difficult in theory. But it takes some work to ensure it works on your machine (I am guessing you are on MS Windows) to get the required underlying software to work.

    If you're ok with sharing all your invoices in a zip, send it to a freelancer and get the result back, that makes the setup a bit easier.

    PS. Please note that to run such a script on your machine, you would need an API Key which requires a paid Chatgpt subscription.
     
    Last edited:
    Upvote 0

    Frans VH

    Free Member
  • Dec 19, 2012
    68
    21
    Near Brussels
    You could also ask ChatGPT to generate the commands to rename the files with something like this:

    please extract invoice date, invoice amount and supplier name from thes invoices and generate a "rename" command that renames each file to YYYYMMDD-[SupplierName]-[InvoiceAmount].pdf so that I can copy & paste these commands in one go into a windows Terminal session.
    Please output all rename commands in one code window


    You can then open a Windows terminal session using Start -> Command, and copy/paste the rename commands.

    Unfortunately, ChatGPT only allows uploading a maximum of 10 files in one go.
     
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,673
    8
    15,372
    Aldershot
    www.aerin.co.uk
    Or just pay a freelancer in India tuppence to do the renaming for you.

    By the time you have got ChatGPT set up and faffed about with the renaming commands the whole thing would be done and delivered.

    I needed a whole bunch of images renamed. Put in the project on fiverr and it was done in a couple of hours for £20.
     
    Upvote 0

    MikeJ

    Free Member
    Jan 15, 2008
    6,949
    2,241
    Northumbeland
    Many years ago I wrote a script to pick up the text in a PDF document and extract certain bits of data, using Visual Basic 6. I was only doing one file at a time, but the basic text extraction is pretty straightforward. The difficult thing in your case is the text is not going to be in consistent format, so you'll be looking for various strings depending on who's sent the invoice.
     
    Upvote 0
    D

    Deleted member 356896

    Thanks for the replies thus far.

    I don’t have any software development skills so writing code (even basic code) is out for me.

    I’m not comfortable sending copies of these documents to someone to edit. Although they’re mainly receipts I’m sure they’ll be other documentation in there that would be a security risk to share (eg bank statements).

    Looks like there is no off the shelf software solution.

    The best suggestion seems to be use Windows Explorer with a preview pane to quickly view and edit the documents.
     
    Upvote 0

    MikeJ

    Free Member
    Jan 15, 2008
    6,949
    2,241
    Northumbeland
    Getting the date would be relatively easy, as you can search for something that looks like a year, and then text search either side of that to try to work out the date. The amount may be possible as you can look for a £ sign, and then look for numbers. You'd have to do that several times to find the largest number, which is likely to be the invoice total. Trying to find the supplier name would be difficult though.
     
    Upvote 0

    DontAsk

    Free Member
    Jan 7, 2015
    5,446
    3
    1,392
    Getting the date would be relatively easy, as you can search for something that looks like a year, and then text search either side of that to try to work out the date.
    So explain how to do that to someone with no software skills. Dates can be in many formats. You could quickly end up needing a regex.

    The amount may be possible as you can look for a £ sign, and then look for numbers. You'd have to do that several times to find the largest number, which is likely to be the invoice total.
    Not if a discount is applied after the total is listed.

    Trying to find the supplier name would be difficult though.
    Really? That's the easy part, given a list of known suppliers.
     
    Upvote 0

    Latest Articles