Tengo esta lista de dict y quiero eliminar los duplicados según el nombre, pero al mismo tiempo elegir el tipo de clave por este orden [Polygone, LineString, Point]

dictionary = [{'firstName': 'Jabari', 'type':'Polygone'},{'firstName': 'Jabari', 'type':'LineString'},{'firstName': 'Jabari', 'type':'Point'},{'firstName': 'Jabari', 'type':'Polygone'},{'firstName': 'Bane', 'type':'LineString'},{'firstName': 'Bane', 'type':'Point'},{'firstName': 'Jack', 'type':'Point'}]

El resultado sería:

dictionary = [{'firstName': 'Jabari', 'type':'Polygone'},{'firstName': 'Bane', 'type':'LineString'},{'firstName': 'Jack', 'type':'Point'}]

Eliminé el duplicado pero no sé cómo hacer que funcione la segunda parte

done = set()
result = []
for d in dictionary:
    if d['firstName'] not in done:
        done.add(d['firstName']) 
        result.append(d)
print(result)

Gracias

-1
Tarek Tarek 10 oct. 2019 a las 02:33

3 respuestas

La mejor respuesta

Opción 1: use condicionales para filtrar a la salida deseada

def filter_dict(input_dict):
  # place priorities on type
  accept = {'Polygone':3, 'LineString':2, 'Point':1}
  done = set()
  result = []
  for current in input_dict:
    if current['type'] in accept.keys():
      # Acceptable type
      if current['firstName'] not in done:
        # Not present and one of the acceptable types
        done.add(current['firstName']) 
        result.append(current)
      elif current['firstName'] in done:
        # Duplicate, check if higher priority
        for i in range(len(result)):
          previous = result[i]
          if result[i]['firstName'] == current['firstName'] and \
            accept[previous['type']] < accept[current['type']]:
            # Higher Priority with same name, so replace with current
            result[i] = current

  return result

pp = pprint.PrettyPrinter(indent=4)


d1 = [{'firstName': 'Jabari', 'type':'Polygone'},
{'firstName': 'Jabari', 'type':'LineString'},
{'firstName': 'Jabari', 'type':'Point'},
{'firstName': 'Jabari', 'type':'Polygone'},
{'firstName': 'Bane', 'type':'LineString'},
{'firstName': 'Bane', 'type':'Point'},
{'firstName': 'Jack', 'type':'Point'}]

print('First Output')
pp.pprint(filter_dict(d1))

d2 = [{'firstName': 'Jabari', 'type':'Point'},
  {'firstName': 'Jabari', 'type':'LineString'},
  {'firstName': 'Jabari', 'type':'Polygone'},
  {'firstName': 'Bane', 'type':'LineString'},
  {'firstName': 'Bane', 'type':'Point'},
  {'firstName': 'Jack', 'type':'Point'},
  {'firstName': 'Jack', 'type':'Polygone'},
  {'firstName': 'Jack', 'type':'LineString'}] 

print('Second Output')
pp.pprint(filter_dict(d2))

Opción 2: usar herramientas iterativas

from itertools import groupby

def filter_itertools(input_dict):
  g = groupby(input_dict, lambda d: d['firstName'])
  accept = {'Polygone':3, 'LineString':2, 'Point':1}
  result = [max(v, key=lambda d: accept[d['type']]) for k, v in g]
  return result

print('First itertools')
pp.pprint(filter_itertools(d1))
print('Second itertools')
pp.pprint(filter_itertools(d2))

Salida (Ambas opciones tienen el mismo resultado)

First Output
[   {'firstName': 'Jabari', 'type': 'Polygone'},
    {'firstName': 'Bane', 'type': 'LineString'},
    {'firstName': 'Jack', 'type': 'Point'}]
Second Output
[   {'firstName': 'Jabari', 'type': 'Polygone'},
    {'firstName': 'Bane', 'type': 'LineString'},
    {'firstName': 'Jack', 'type': 'Polygone'}]
First itertools
[   {'firstName': 'Jabari', 'type': 'Polygone'},
    {'firstName': 'Bane', 'type': 'LineString'},
    {'firstName': 'Jack', 'type': 'Point'}]
Second itertools
[   {'firstName': 'Jabari', 'type': 'Polygone'},
    {'firstName': 'Bane', 'type': 'LineString'},
    {'firstName': 'Jack', 'type': 'Polygone'}]
1
DarrylG 10 oct. 2019 a las 02:10

Solucionaría esto construyendo primero un diccionario que registrara cada combinación de firstName y su types asociado en el conjunto de datos. Luego procesaría ese diccionario para crear el formato de salida que necesitaba:

#!/usr/bin/env python

from collections import defaultdict

# These are the names of the types, in descending order of importance
ORDER = ("Polygone", "LineString", "Point")

given = [
    {"firstName": "Jabari", "type": "Polygone"},
    {"firstName": "Jabari", "type": "LineString"},
    {"firstName": "Jabari", "type": "Point"},
    {"firstName": "Jabari", "type": "Polygone"},
    {"firstName": "Bane", "type": "LineString"},
    {"firstName": "Bane", "type": "Point"},
    {"firstName": "Jack", "type": "Point"},
]

expected = [
    {"firstName": "Jabari", "type": "Polygone"},
    {"firstName": "Bane", "type": "LineString"},
    {"firstName": "Jack", "type": "Point"},
]

# For each item, we're going to store all of the types of that item that we've seen. Making this a
# dict handles the dedupeing part for free! Making the dict's value a set means that we don't care
# how many entries we find for each item: even if there are 1,000,000, we'll at most be storing a
# three-item set.
found = defaultdict(set)

for item in given:
    # Each "type" will map to the number of its location in the ORDER tuple
    index = ORDER.index(item["type"])

    found[item["firstName"]].add(index)

output = []
for name, types in found.items():
    # Now, for each item in "found" dict, find its smallest type index
    lowest_index = sorted(types)[0]

    # Map that index back to its type name
    type_name = ORDER[lowest_index]

    # Add it to the results
    output.append({"firstName": name, "type": ORDER[lowest_index]})

assert output == expected
0
Kirk Strauser 10 oct. 2019 a las 01:49

Podrías intentar:

import pandas as pd
import numpy as np

# transform your list of dict into a dataframe
df = pd.DataFrame(dictionary) 

# create a new column called "score", assigning 1 for Polygone, 2 for LineString and 3 for Point
df['score'] = np.where(df['type'] == 'Polygone', 1, np.where(df['type'] == 'LineString', 2, np.where(df['type'] == 'Point', 3 , np.nan)))

# sort the dataframe by score
df.sort_values(by='score')

# drop rows with duplicated "firstName" 
# (by default the first duplicate is kept, hence the one with lowest score)
# remember: 1 -> Polygone, 2 -> LineString, 3 -> Point
df = df.drop_duplicates('firstName')

# drop the columns "score"
df = df.drop('score', axis=1)

# re-transform the dataframe into a list of dictionaries as it was at the beginning
new_list_dict = df.to_dict('records')

print(new_list_dict)

La parte más complicada quizás sea la parte np.where.

Básicamente, np.where toma una condición como primer parámetro (df['type'] == 'Polygone'), luego devuelve que el segundo parámetro es verdadero (1), o devuelve el tercer parámetro si la condición no se cumple.

En este caso, lo que devuelve si no se cumple la condición es otro np.where, que esta vez comprueba si el "tipo" es "LineString". Si es "LineString", entonces devuelve 2.

De lo contrario, se llama a otro np.where, que comprueba si el "tipo" corresponde a "Punto" y devuelve 3 si lo hace.

Si el "tipo" no es ninguno de los tres, devuelve Nan. Pero creo que esto no debería suceder en su caso.

0
Giallo 10 oct. 2019 a las 00:35
58313524