Tengo un archivo json que es bastante grande y una función que lee el archivo json.

Necesito mantener el estado entre las llamadas a funciones (la próxima vez que se llame a la función no necesito leer el archivo json de la primera línea, sino que necesito que la función retome desde donde había permanecido), así que la primera Lo que me vino a la mente fue el protocolo generador.

Este es mi primer intento, pero el comportamiento de la función no es el deseado:


def first_task(file_name):
    """"
    Read file_name line by line
    """
    import json
    data = []
    with open(file_name) as f: 
        for line in f:
            try:
                data.append(json.loads(line))
            except ValueError:  # includes simplejson.decoder.JSONDecodeError
                print ('Decoding JSON has failed')
    yield data

Lo que me gustaría es cuando first_task("test.json") se llama por primera vez para devolver la primera línea de json, luego cuando la función se llama segunda tiempo para devolver la segunda línea de json, y así sucesivamente hasta que se alcance el EOF.

Muestra del archivo json:


{"venue":{"venue_name":"Datong High School","lon":0,"lat":0,"venue_id":23779799},"visibility":"public","response":"no","guests":0,"member":{"member_id":120119272,"photo":"http:\/\/photos3.meetupstatic.com\/photos\/member\/b\/2\/b\/c\/thumb_262125756.jpeg","member_name":"Allen Wang"},"rsvp_id":1658733801,"mtime":1489925470960,"event":{"event_name":"Play Intermediate Volleyball","event_id":"jkpwmlywgbmb","time":1491613200000,"event_url":"https:\/\/www.meetup.com\/Taipei-Sports-and-Social-Club\/events\/236786445\/"},"group":{"group_topics":[{"urlkey":"fitness","topic_name":"Fitness"},{"urlkey":"mountain-biking","topic_name":"Mountain Biking"},{"urlkey":"sports","topic_name":"Sports and Recreation"},{"urlkey":"outdoors","topic_name":"Outdoors"},{"urlkey":"fun-times","topic_name":"Fun Times"},{"urlkey":"winter-and-summer-sports","topic_name":"Winter and Summer Sports"},{"urlkey":"adventure","topic_name":"Adventure"},{"urlkey":"water-sports","topic_name":"Water Sports"},{"urlkey":"sports-and-socials","topic_name":"Sports and Socials"},{"urlkey":"hiking","topic_name":"Hiking"},{"urlkey":"excercise","topic_name":"Exercise"},{"urlkey":"recreational-sports","topic_name":"Recreational Sports"}],"group_city":"Taipei","group_country":"tw","group_id":16585312,"group_name":"Taipei Sports and Social Club","group_lon":121.45,"group_urlname":"Taipei-Sports-and-Social-Club","group_lat":25.02}}
{"venue":{"venue_name":"Cafe Vitus","lon":121.54731,"lat":25.052959,"venue_id":19712922},"visibility":"public","response":"no","guests":0,"member":{"member_id":221379606,"photo":"http:\/\/photos2.meetupstatic.com\/photos\/member\/8\/3\/c\/4\/thumb_263973732.jpeg","member_name":"Benita  Syu"},"rsvp_id":1658877353,"mtime":1489925471668,"event":{"event_name":"New Place! Every Saturday night multilingual café","event_id":"hvkmsmywfbhc","time":1490439600000,"event_url":"https:\/\/www.meetup.com\/polyglottw\/events\/238185973\/"},"group":{"group_topics":[{"urlkey":"language","topic_name":"Language & Culture"},{"urlkey":"language-exchange","topic_name":"Language Exchange"},{"urlkey":"chinese-language","topic_name":"Chinese Language"}],"group_city":"Taipei","group_country":"tw","group_id":18743595,"group_name":"Multilingual Cafe Language Exchange","group_lon":121.45,"group_urlname":"polyglottw","group_lat":25.02}}
{"venue":{"venue_name":"Panera Bread","lon":0,"lat":0,"venue_id":24945082},"visibility":"public","response":"yes","guests":0,"member":{"member_id":44748032,"photo":"http:\/\/photos4.meetupstatic.com\/photos\/member\/d\/7\/1\/6\/thumb_64255062.jpeg","member_name":"Valerie"},"rsvp_id":1658877355,"mtime":1489925472035,"event":{"event_name":"Meet & Greet Icebreaker conversations in Morris County","event_id":"236222256","time":1490389200000,"event_url":"https:\/\/www.meetup.com\/Mingle-Around-In-North-Jersey-Single-Events-Adventures\/events\/236222256\/"},"group":{"group_topics":[{"urlkey":"wine","topic_name":"Wine"},{"urlkey":"hiking","topic_name":"Hiking"},{"urlkey":"diningout","topic_name":"Dining Out"},{"urlkey":"marketing","topic_name":"Marketing"},{"urlkey":"newintown","topic_name":"New In Town"},{"urlkey":"socialnetwork","topic_name":"Social Networking"},{"urlkey":"women","topic_name":"Women's Social"},{"urlkey":"outdoors","topic_name":"Outdoors"},{"urlkey":"professional-networking","topic_name":"Professional Networking"},{"urlkey":"adventure","topic_name":"Adventure"},{"urlkey":"singles-30s-50s","topic_name":"Singles 30's-50's"},{"urlkey":"small-business-marketing-strategy","topic_name":"Small Business Marketing Strategy"},{"urlkey":"professional-singles","topic_name":"Single Professionals"},{"urlkey":"dating-and-relationships","topic_name":"Dating and Relationships"},{"urlkey":"singles-40s-50s","topic_name":"Singles 40's - 50's"}],"group_city":"Hackensack","group_country":"us","group_id":17370312,"group_name":"Mingle Around 30s 40s 50s (Single Events & Adventures)","group_lon":-74.05,"group_urlname":"Mingle-Around-In-North-Jersey-Single-Events-Adventures","group_state":"NJ","group_lat":40.89}}
1
dejdej 23 feb. 2020 a las 20:20

2 respuestas

La mejor respuesta

Esto debería funcionar. Sin verificación de errores.

def gen(file_name):
    with open(file_name) as fh:
        line = fh.readline()
        while line:
            yield json.loads(line)
            line = fh.readline()

En python 3.8 podrías hacerlo así:

def gen(file_name):
    with open(file_name) as fh:
        while line := fh.readline():
            yield json.loads(line)
0
Sabin Purice 23 feb. 2020 a las 18:28

Aunque no está del todo claro lo que necesita, si solo necesita líneas individuales, en lugar de construir una lista, puede generar cada línea y luego llamar al generador con next según lo necesite, ya que esto solo lea el archivo una vez y proporcione líneas individuales según sea necesario hasta que se agote:

## contents of example.txt

{"first": "line"}
{"second": "line"}
{"third": "line"}

Construcción de proveedor de línea:

def supply_line(file_name):
    with open(file_name) as fh:
        for line in fh:
            yield json.loads(line)

Cuando lo llame con next asegúrese de proporcionar un predeterminado, por lo que no tiene una envoltura alrededor de try/except para detectar una StopIteration excepción:

producer = supply_line('file_with_json_lines.txt')

In [7]: next(producer, '')
Out[7]: {'first': 'line'}

In [8]: next(producer, '')
Out[8]: {'second': 'line'}

In [9]: next(producer, '')
Out[9]: {'third': 'line'}

## when the file is done, it will produce a default, which in this case is an empty string
In [10]: next(producer, '')
Out[10]: ''

In [11]: next(producer, '')
Out[11]: ''

Si necesita comenzar de nuevo, puede volver a llamar al generador, si desea almacenar todas las líneas en una lista, puede enviar el generador a la lista ( sin embargo, esto cargará el archivo cada vez ):

In [13]: all_lines = list(supply_line('file_with_json_lines.txt'))

In [14]: all_lines
Out[14]: [{'first': 'line'}, {'second': 'line'}, {'third': 'line'}]

Y obviamente un bucle for:

In [15]: for line in supply_line('file_with_json_lines.txt'):
    ...:     print(line)
    ...:
{'first': 'line'}
{'second': 'line'}
{'third': 'line'}
1
salparadise 23 feb. 2020 a las 18:49