He creado un script py que funciona con PikePDF y PDFminer3 que eliminará un PDF de mi escritorio y creará un archivo txt a partir de las palabras disponibles.

El propósito de esto es ayudar a mi equipo en el trabajo a enmendar documentos legales que a menudo no pueden copiarse para enmiendas (y, por lo tanto, deben escribirse a mano). Como la mayoría de mis colegas son reacios a configurar anaconda y usar python, quería usar pyinstaller para convertir mi script en un .exe.

Cuando ejecuto la aplicación creada por pyinstaller, puedo completar algunas entradas preliminares antes de recibir este error:

    Traceback <most recent call last>:
      File 'PDF2TEXT.py', line 35, in <module>
    ModuleNotFoundError: No module named 'pikepdf._cpphelpers'
    (10688) Failed to execute script PDF2TEXT

Durante la compilación de pyinstaller también recibo muchas advertencias consecutivas que hacen falta archivos anaconda3 dll:

Warning: lib not found: msmpi.dll dependency of c:\users\anejar1\appdata\local\continuum\anaconda3\Library\bin\mkl_blacs_mspi_ilp64.dll

He cavado un poco y he aplicado algunas soluciones en otros hilos sin éxito, incluida la ejecución:

pyinstaller --path= [path to pikepdf] --path= [path to pdfminer3] -F PDF2TEXT.py

Y

pyinstaller --hidden-import=pikepdf --hidden-import=pdfminer3 -F PDF2TEXT.py

El código subyacente es bastante corto (y funciona bien) y solo importa desde pikepdf, pdfminer y os:

"""
from pdfminer3.layout import LAParams, LTTextBox, LTTextLine
from pdfminer3.pdfpage import PDFPage, PDFTextExtractionNotAllowed
from pdfminer3.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer3.converter import PDFPageAggregator
from pdfminer3.pdfparser import PDFParser
from pdfminer3.pdfdocument import PDFDocument
from pdfminer3.pdfdevice import PDFDevice
import io
import os
import pikepdf

print('Enter your CBA User ID:')
CBAusername = input()
print('\nEnter the name of your PDF:')
FileName = input()

#############################

base_path = "C:/Users/" + CBAusername + "/Desktop"
pike_path = "C:/Users/" + CBAusername + "/AppData/Roaming/Python/Python36/PikePDF"
my_file = os.path.join(base_path + "/" + FileName + ".pdf")
my_extractable_file = os.path.join(pike_path + "/" + "PikedPDF2TEXT.pdf")
log_file = os.path.join(base_path + "/" + "PDF2TEXT.txt")

password = ""
extract = ""

pdf = pikepdf.open(my_file)
pdf.save(my_extractable_file)

fp = open(my_extractable_file, "rb")
parser = PDFParser(fp)
document = PDFDocument(parser, password)

if  not document.is_extractable:
   raise PDFTextExtractionNotAllowed

rsrcmgr = PDFResourceManager()
laparams = LAParams()

device = PDFPageAggregator(rsrcmgr, laparams=laparams)

interpreter = PDFPageInterpreter(rsrcmgr, device)

for page in PDFPage.create_pages(document):
    interpreter.process_page(page)
    layout = device.get_result()
    for lt_obj in layout:
        if isinstance(lt_obj,LTTextBox) or isinstance(lt_obj, LTTextLine):
            extract += lt_obj.get_text()

fp.close()

print(extract.encode("utf-8"))

with open(log_file, "wb") as my_log:
    my_log.write(extract.encode("utf-8"))
    print ("Done !!")

Si es de algún uso adicional sobre la información provista, el registro completo del proceso del instalador está aquí:

(C:\Users\anejar1\AppData\Local\Continuum\anaconda3) C:\Users\anejar1>cd C:\User
s\anejar1\AppData\Roaming\Python\Python36\Scripts

(C:\Users\anejar1\AppData\Local\Continuum\anaconda3) C:\Users\anejar1\AppData\Ro
aming\Python\Python36\Scripts>pyinstaller --hidden-import=pdfminer3 --hiddenimpo
rt=pikepdf -F PDF2TEXT.py
214 INFO: PyInstaller: 3.5
214 INFO: Python: 3.6.3
215 INFO: Platform: Windows-8.1-6.3.9600-SP0
218 INFO: wrote C:\Users\anejar1\AppData\Roaming\Python\Python36\Scripts\PDF2TEX
T.spec
221 INFO: UPX is not available.
225 INFO: Extending PYTHONPATH with paths
['C:\\Users\\anejar1\\AppData\\Roaming\\Python\\Python36\\Scripts',
 'C:\\Users\\anejar1\\AppData\\Roaming\\Python\\Python36\\Scripts']
225 INFO: checking Analysis
225 INFO: Building Analysis because Analysis-00.toc is non existent
226 INFO: Initializing module dependency graph...
235 INFO: Initializing module graph hooks...
240 INFO: Analyzing base_library.zip ...
8340 INFO: Analyzing hidden import 'pdfminer3'
8613 INFO: Analyzing hidden import 'pikepdf'
9570 INFO: Processing pre-find module path hook   distutils
10124 INFO: Processing pre-find module path hook   site
10125 INFO: site: retargeting to fake-dir 'C:\\Users\\anejar1\\AppData\\Roaming\
\Python\\Python36\\site-packages\\PyInstaller\\fake-modules'
15732 INFO: Processing pre-safe import module hook   setuptools.extern.six.moves

26995 INFO: running Analysis Analysis-00.toc
27021 INFO: Adding Microsoft.Windows.Common-Controls to dependent assemblies of
final executable
  required by c:\users\anejar1\appdata\local\continuum\anaconda3\python.exe
27608 INFO: Caching module hooks...
27623 INFO: Analyzing C:\Users\anejar1\AppData\Roaming\Python\Python36\Scripts\P
DF2TEXT.py
31216 INFO: Loading module hooks...
31217 INFO: Loading module hook "hook-Crypto.py"...
31252 INFO: Loading module hook "hook-distutils.py"...
31255 INFO: Loading module hook "hook-encodings.py"...
31453 INFO: Loading module hook "hook-lib2to3.py"...
31465 INFO: Loading module hook "hook-lxml.etree.py"...
31467 INFO: Loading module hook "hook-PIL.Image.py"...
33168 INFO: Loading module hook "hook-PIL.py"...
33172 INFO: Excluding import 'PySide'
33177 INFO:   Removing import of PySide from module PIL.ImageQt
33178 INFO: Excluding import 'tkinter'
33182 INFO:   Removing import of tkinter from module PIL.ImageTk
33183 INFO: Import to be excluded not found: 'FixTk'
33184 INFO: Excluding import 'PyQt5'
33187 INFO:   Removing import of PyQt5.QtGui from module PIL.ImageQt
33187 INFO:   Removing import of PyQt5.QtCore from module PIL.ImageQt
33190 INFO: Excluding import 'PyQt4'
33194 INFO:   Removing import of PyQt4 from module PIL.ImageQt
33196 INFO: Loading module hook "hook-PIL.SpiderImagePlugin.py"...
33199 INFO: Excluding import 'tkinter'
33201 INFO: Import to be excluded not found: 'FixTk'
33202 INFO: Loading module hook "hook-pkg_resources.py"...
35151 INFO: Processing pre-safe import module hook   win32com
35775 INFO: Loading module hook "hook-pycparser.py"...
36116 INFO: Loading module hook "hook-pydoc.py"...
36118 INFO: Loading module hook "hook-PyQt5.py"...
36389 INFO: Loading module hook "hook-PyQt5.QtCore.py"...
36390 INFO: Loading module hook "hook-PyQt5.QtGui.py"...
36393 INFO: Loading module hook "hook-pythoncom.py"...
37711 INFO: Loading module hook "hook-pywintypes.py"...
38928 INFO: Loading module hook "hook-setuptools.py"...
54087 INFO: Loading module hook "hook-sysconfig.py"...
54090 INFO: Loading module hook "hook-win32com.py"...
55853 INFO: Loading module hook "hook-xml.dom.domreg.py"...
55855 INFO: Loading module hook "hook-xml.py"...
55858 INFO: Loading module hook "hook-_tkinter.py"...
56277 INFO: checking Tree
56278 INFO: Building Tree because Tree-00.toc is non existent
56278 INFO: Building Tree Tree-00.toc
56519 INFO: checking Tree
56519 INFO: Building Tree because Tree-01.toc is non existent
56520 INFO: Building Tree Tree-01.toc
56553 INFO: Loading module hook "hook-numpy.core.py"...
56851 INFO: MKL libraries found when importing numpy. Adding MKL to binaries
56859 INFO: Loading module hook "hook-numpy.py"...
56863 INFO: Loading module hook "hook-pytest.py"...
59074 INFO: Loading module hook "hook-scipy.py"...
59077 WARNING: Hidden import "scipy._lib.messagestream" not found!
59078 WARNING: Hidden import "scipy._lib._fpumode" not found!
59162 INFO: Looking for ctypes DLLs
59242 INFO: Analyzing run-time hooks ...
59252 INFO: Including run-time hook 'pyi_rth_pkgres.py'
59255 INFO: Including run-time hook 'pyi_rth_win32comgenpy.py'
59258 INFO: Including run-time hook 'pyi_rth_multiprocessing.py'
59264 INFO: Including run-time hook 'pyi_rth_pyqt5.py'
59286 INFO: Looking for dynamic libraries
59698 WARNING: lib not found: msmpi.dll dependency of c:\users\anejar1\appdata\l
ocal\continuum\anaconda3\Library\bin\mkl_blacs_msmpi_ilp64.dll
59779 WARNING: lib not found: impi.dll dependency of c:\users\anejar1\appdata\lo
cal\continuum\anaconda3\Library\bin\mkl_blacs_intelmpi_ilp64.dll
59850 WARNING: lib not found: msmpi.dll dependency of c:\users\anejar1\appdata\l
ocal\continuum\anaconda3\Library\bin\mkl_blacs_msmpi_lp64.dll
60005 WARNING: lib not found: tbb.dll dependency of c:\users\anejar1\appdata\loc
al\continuum\anaconda3\Library\bin\mkl_tbb_thread.dll
60226 WARNING: lib not found: pgf90.dll dependency of c:\users\anejar1\appdata\l
ocal\continuum\anaconda3\Library\bin\mkl_pgi_thread.dll
60294 WARNING: lib not found: pgf90rtl.dll dependency of c:\users\anejar1\appdat
a\local\continuum\anaconda3\Library\bin\mkl_pgi_thread.dll
60355 WARNING: lib not found: pgc.dll dependency of c:\users\anejar1\appdata\loc
al\continuum\anaconda3\Library\bin\mkl_pgi_thread.dll
60621 WARNING: lib not found: mpich2mpi.dll dependency of c:\users\anejar1\appda
ta\local\continuum\anaconda3\Library\bin\mkl_blacs_mpich2_lp64.dll
61074 WARNING: lib not found: mpich2mpi.dll dependency of c:\users\anejar1\appda
ta\local\continuum\anaconda3\Library\bin\mkl_blacs_mpich2_ilp64.dll
61203 WARNING: lib not found: impi.dll dependency of c:\users\anejar1\appdata\lo
cal\continuum\anaconda3\Library\bin\mkl_blacs_intelmpi_lp64.dll
65487 INFO: Found C:\windows\WinSxS\Manifests\amd64_policy.9.0.microsoft.vc90.cr
t_1fc8b3b9a1e18e3b_9.0.30729.6161_none_acd388d7e1d8689f.manifest
65496 INFO: Found C:\windows\WinSxS\Manifests\amd64_policy.9.0.microsoft.vc90.cr
t_1fc8b3b9a1e18e3b_9.0.30729.8387_none_acd5043fe1d73003.manifest
65811 INFO: Searching for assembly amd64_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0
.30729.8387_none ...
65812 INFO: Found manifest C:\windows\WinSxS\Manifests\amd64_microsoft.vc90.crt_
1fc8b3b9a1e18e3b_9.0.30729.8387_none_08e793bfa83a89b5.manifest
65815 INFO: Searching for file msvcr90.dll
65816 INFO: Found file C:\windows\WinSxS\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e
3b_9.0.30729.8387_none_08e793bfa83a89b5\msvcr90.dll
65817 INFO: Searching for file msvcp90.dll
65818 INFO: Found file C:\windows\WinSxS\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e
3b_9.0.30729.8387_none_08e793bfa83a89b5\msvcp90.dll
65818 INFO: Searching for file msvcm90.dll
65819 INFO: Found file C:\windows\WinSxS\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e
3b_9.0.30729.8387_none_08e793bfa83a89b5\msvcm90.dll
66044 INFO: Found C:\windows\WinSxS\Manifests\amd64_policy.9.0.microsoft.vc90.cr
t_1fc8b3b9a1e18e3b_9.0.30729.6161_none_acd388d7e1d8689f.manifest
66045 INFO: Found C:\windows\WinSxS\Manifests\amd64_policy.9.0.microsoft.vc90.cr
t_1fc8b3b9a1e18e3b_9.0.30729.8387_none_acd5043fe1d73003.manifest
66047 INFO: Adding redirect Microsoft.VC90.CRT version (9, 0, 21022, 8) -> (9, 0
, 30729, 8387)
66111 WARNING: lib not found: MSVCR90.dll dependency of c:\users\anejar1\appdata
\local\continuum\anaconda3\Library\bin\zlib.dll
66813 INFO: Looking for eggs
66813 INFO: Using Python library c:\users\anejar1\appdata\local\continuum\anacon
da3\python36.dll
66814 INFO: Found binding redirects:
[BindingRedirect(name='Microsoft.VC90.CRT', language=None, arch='amd64', oldVers
ion=(9, 0, 21022, 8), newVersion=(9, 0, 30729, 8387), publicKeyToken='1fc8b3b9a1
e18e3b')]
66846 INFO: Warnings written to C:\Users\anejar1\AppData\Roaming\Python\Python36
\Scripts\build\PDF2TEXT\warn-PDF2TEXT.txt
67269 INFO: Graph cross-reference written to C:\Users\anejar1\AppData\Roaming\Py
thon\Python36\Scripts\build\PDF2TEXT\xref-PDF2TEXT.html
67347 INFO: checking PYZ
67348 INFO: Building PYZ because PYZ-00.toc is non existent
67349 INFO: Building PYZ (ZlibArchive) C:\Users\anejar1\AppData\Roaming\Python\P
ython36\Scripts\build\PDF2TEXT\PYZ-00.pyz
70044 INFO: Building PYZ (ZlibArchive) C:\Users\anejar1\AppData\Roaming\Python\P
ython36\Scripts\build\PDF2TEXT\PYZ-00.pyz completed successfully.
70092 INFO: checking PKG
70093 INFO: Building PKG because PKG-00.toc is non existent
70093 INFO: Building PKG (CArchive) PKG-00.pkg
194058 INFO: Building PKG (CArchive) PKG-00.pkg completed successfully.
194071 INFO: Bootloader C:\Users\anejar1\AppData\Roaming\Python\Python36\site-pa
ckages\PyInstaller\bootloader\Windows-64bit\run.exe
194072 INFO: checking EXE
194073 INFO: Building EXE because EXE-00.toc is non existent
194073 INFO: Building EXE from EXE-00.toc
194074 INFO: Appending archive to EXE C:\Users\anejar1\AppData\Roaming\Python\Py
thon36\Scripts\dist\PDF2TEXT.exe
195004 INFO: Building EXE from EXE-00.toc completed successfully.

(C:\Users\anejar1\AppData\Local\Continuum\anaconda3) C:\Users\anejar1\AppData\Ro
aming\Python\Python36\Scripts>```
0
Rishabh Aneja 9 oct. 2019 a las 10:28

3 respuestas

La mejor respuesta

Encontré una solución a este error:

No module named 'pikepdf._cpphelpers'

Simplemente agregue:

from pikepdf import _cpphelpers

Al principio de tu guión

0
Fred_Alb 18 oct. 2019 a las 11:30

Creo que necesitas probar pikepdf para tu versión de Python.

Consulte el siguiente enlace para instalar el módulo pikepdf

1
Mayur Jotaniya 9 oct. 2019 a las 07:32

Tengo mucho este problema, creo que pyinstaller pierde ciertos módulos.

Lo que debe hacer es encontrar la ubicación del módulo que falta, en su caso ikePDF y PDFminer3

Vaya a su directorio de Python, debería estar en un lugar como este: C: \ Users \\ AppData \ Local \ Programs \ Python \ Python37-32 \ Lib \ site-packages

Copie la carpeta de los paquetes que faltan y péguelos en la carpeta dist de la carpeta de la aplicación exe

0
EcSync 18 oct. 2019 a las 11:18
58299066