Tengo un archivo json
que es bastante grande y una función que lee el archivo json
.
Necesito mantener el estado entre las llamadas a funciones (la próxima vez que se llame a la función no necesito leer el archivo json de la primera línea, sino que necesito que la función retome desde donde había permanecido), así que la primera Lo que me vino a la mente fue el protocolo generador.
Este es mi primer intento, pero el comportamiento de la función no es el deseado:
def first_task(file_name):
""""
Read file_name line by line
"""
import json
data = []
with open(file_name) as f:
for line in f:
try:
data.append(json.loads(line))
except ValueError: # includes simplejson.decoder.JSONDecodeError
print ('Decoding JSON has failed')
yield data
Lo que me gustaría es cuando first_task("test.json")
se llama por primera vez para devolver la primera línea de json, luego cuando la función se llama segunda tiempo para devolver la segunda línea de json, y así sucesivamente hasta que se alcance el EOF.
Muestra del archivo json
:
{"venue":{"venue_name":"Datong High School","lon":0,"lat":0,"venue_id":23779799},"visibility":"public","response":"no","guests":0,"member":{"member_id":120119272,"photo":"http:\/\/photos3.meetupstatic.com\/photos\/member\/b\/2\/b\/c\/thumb_262125756.jpeg","member_name":"Allen Wang"},"rsvp_id":1658733801,"mtime":1489925470960,"event":{"event_name":"Play Intermediate Volleyball","event_id":"jkpwmlywgbmb","time":1491613200000,"event_url":"https:\/\/www.meetup.com\/Taipei-Sports-and-Social-Club\/events\/236786445\/"},"group":{"group_topics":[{"urlkey":"fitness","topic_name":"Fitness"},{"urlkey":"mountain-biking","topic_name":"Mountain Biking"},{"urlkey":"sports","topic_name":"Sports and Recreation"},{"urlkey":"outdoors","topic_name":"Outdoors"},{"urlkey":"fun-times","topic_name":"Fun Times"},{"urlkey":"winter-and-summer-sports","topic_name":"Winter and Summer Sports"},{"urlkey":"adventure","topic_name":"Adventure"},{"urlkey":"water-sports","topic_name":"Water Sports"},{"urlkey":"sports-and-socials","topic_name":"Sports and Socials"},{"urlkey":"hiking","topic_name":"Hiking"},{"urlkey":"excercise","topic_name":"Exercise"},{"urlkey":"recreational-sports","topic_name":"Recreational Sports"}],"group_city":"Taipei","group_country":"tw","group_id":16585312,"group_name":"Taipei Sports and Social Club","group_lon":121.45,"group_urlname":"Taipei-Sports-and-Social-Club","group_lat":25.02}}
{"venue":{"venue_name":"Cafe Vitus","lon":121.54731,"lat":25.052959,"venue_id":19712922},"visibility":"public","response":"no","guests":0,"member":{"member_id":221379606,"photo":"http:\/\/photos2.meetupstatic.com\/photos\/member\/8\/3\/c\/4\/thumb_263973732.jpeg","member_name":"Benita Syu"},"rsvp_id":1658877353,"mtime":1489925471668,"event":{"event_name":"New Place! Every Saturday night multilingual café","event_id":"hvkmsmywfbhc","time":1490439600000,"event_url":"https:\/\/www.meetup.com\/polyglottw\/events\/238185973\/"},"group":{"group_topics":[{"urlkey":"language","topic_name":"Language & Culture"},{"urlkey":"language-exchange","topic_name":"Language Exchange"},{"urlkey":"chinese-language","topic_name":"Chinese Language"}],"group_city":"Taipei","group_country":"tw","group_id":18743595,"group_name":"Multilingual Cafe Language Exchange","group_lon":121.45,"group_urlname":"polyglottw","group_lat":25.02}}
{"venue":{"venue_name":"Panera Bread","lon":0,"lat":0,"venue_id":24945082},"visibility":"public","response":"yes","guests":0,"member":{"member_id":44748032,"photo":"http:\/\/photos4.meetupstatic.com\/photos\/member\/d\/7\/1\/6\/thumb_64255062.jpeg","member_name":"Valerie"},"rsvp_id":1658877355,"mtime":1489925472035,"event":{"event_name":"Meet & Greet Icebreaker conversations in Morris County","event_id":"236222256","time":1490389200000,"event_url":"https:\/\/www.meetup.com\/Mingle-Around-In-North-Jersey-Single-Events-Adventures\/events\/236222256\/"},"group":{"group_topics":[{"urlkey":"wine","topic_name":"Wine"},{"urlkey":"hiking","topic_name":"Hiking"},{"urlkey":"diningout","topic_name":"Dining Out"},{"urlkey":"marketing","topic_name":"Marketing"},{"urlkey":"newintown","topic_name":"New In Town"},{"urlkey":"socialnetwork","topic_name":"Social Networking"},{"urlkey":"women","topic_name":"Women's Social"},{"urlkey":"outdoors","topic_name":"Outdoors"},{"urlkey":"professional-networking","topic_name":"Professional Networking"},{"urlkey":"adventure","topic_name":"Adventure"},{"urlkey":"singles-30s-50s","topic_name":"Singles 30's-50's"},{"urlkey":"small-business-marketing-strategy","topic_name":"Small Business Marketing Strategy"},{"urlkey":"professional-singles","topic_name":"Single Professionals"},{"urlkey":"dating-and-relationships","topic_name":"Dating and Relationships"},{"urlkey":"singles-40s-50s","topic_name":"Singles 40's - 50's"}],"group_city":"Hackensack","group_country":"us","group_id":17370312,"group_name":"Mingle Around 30s 40s 50s (Single Events & Adventures)","group_lon":-74.05,"group_urlname":"Mingle-Around-In-North-Jersey-Single-Events-Adventures","group_state":"NJ","group_lat":40.89}}
2 respuestas
Esto debería funcionar. Sin verificación de errores.
def gen(file_name):
with open(file_name) as fh:
line = fh.readline()
while line:
yield json.loads(line)
line = fh.readline()
En python 3.8 podrías hacerlo así:
def gen(file_name):
with open(file_name) as fh:
while line := fh.readline():
yield json.loads(line)
Aunque no está del todo claro lo que necesita, si solo necesita líneas individuales, en lugar de construir una lista, puede generar cada línea y luego llamar al generador con next
según lo necesite, ya que esto solo lea el archivo una vez y proporcione líneas individuales según sea necesario hasta que se agote:
## contents of example.txt
{"first": "line"}
{"second": "line"}
{"third": "line"}
Construcción de proveedor de línea:
def supply_line(file_name):
with open(file_name) as fh:
for line in fh:
yield json.loads(line)
Cuando lo llame con next
asegúrese de proporcionar un predeterminado, por lo que no tiene una envoltura alrededor de try/except
para detectar una StopIteration
excepción:
producer = supply_line('file_with_json_lines.txt')
In [7]: next(producer, '')
Out[7]: {'first': 'line'}
In [8]: next(producer, '')
Out[8]: {'second': 'line'}
In [9]: next(producer, '')
Out[9]: {'third': 'line'}
## when the file is done, it will produce a default, which in this case is an empty string
In [10]: next(producer, '')
Out[10]: ''
In [11]: next(producer, '')
Out[11]: ''
Si necesita comenzar de nuevo, puede volver a llamar al generador, si desea almacenar todas las líneas en una lista, puede enviar el generador a la lista ( sin embargo, esto cargará el archivo cada vez ):
In [13]: all_lines = list(supply_line('file_with_json_lines.txt'))
In [14]: all_lines
Out[14]: [{'first': 'line'}, {'second': 'line'}, {'third': 'line'}]
Y obviamente un bucle for
:
In [15]: for line in supply_line('file_with_json_lines.txt'):
...: print(line)
...:
{'first': 'line'}
{'second': 'line'}
{'third': 'line'}
Preguntas relacionadas
Nuevas preguntas
python
Python es un lenguaje de programación multipropósito, de tipificación dinámica y de múltiples paradigmas. Está diseñado para ser rápido de aprender, comprender y usar, y hacer cumplir una sintaxis limpia y uniforme. Tenga en cuenta que Python 2 está oficialmente fuera de soporte a partir del 01-01-2020. Aún así, para preguntas de Python específicas de la versión, agregue la etiqueta [python-2.7] o [python-3.x]. Cuando utilice una variante de Python (por ejemplo, Jython, PyPy) o una biblioteca (por ejemplo, Pandas y NumPy), inclúyala en las etiquetas.