Java中List集合去重的常见方法总结

一、基础去重方法

1. 使用HashSet特性去重

// 利用HashSet不重复特性
public static <T> List<T> distinctByHashSet(List<T> list) {
    return new ArrayList<>(new HashSet<>(list));
}

特点：

时间复杂度：O(n)
不保证原始顺序
需要元素正确实现hashCode()和equals()

2. 使用LinkedHashSet保持顺序

// 保持插入顺序的去重
public static <T> List<T> distinctByLinkedHashSet(List<T> list) {
    return new ArrayList<>(new LinkedHashSet<>(list));
}

特点：

时间复杂度：O(n)
保持首次出现的顺序
内存消耗略高于HashSet

3. Java8+ Stream API

// 使用Stream的distinct()方法
public static <T> List<T> distinctByStream(List<T> list) {
    return list.stream().distinct().collect(Collectors.toList());
}

特点：

代码简洁
保持原始顺序
并行处理：可用parallelStream()

二、保持顺序的去重方法

1. 遍历判断法

// 通过遍历实现顺序保持
public static <T> List<T> distinctByLoop(List<T> list) {
    List<T> result = new ArrayList<>();
    for (T item : list) {
        if (!result.contains(item)) {
            result.add(item);
        }
    }
    return result;
}

性能分析：

时间复杂度：O(n²)（因为contains()是O(n)）
适合小数据量
保证原始顺序

2. 使用TreeSet自定义排序

// 自定义排序规则去重
public static <T> List<T> distinctByTreeSet(List<T> list, Comparator<? super T> comparator) {
    Set<T> set = new TreeSet<>(comparator);
    set.addAll(list);
    return new ArrayList<>(set);
}

适用场景：

需要特定排序规则
元素实现Comparable或提供Comparator

三、Java8+高级方法

1. Stream + filter去重

// 根据对象属性去重
public static <T> List<T> distinctByProperty(List<T> list, Function<? super T, ?> keyExtractor) {
    return list.stream()
            .filter(distinctByKey(keyExtractor))
            .collect(Collectors.toList());
}

private static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

示例使用：

List<User> users = ...;
List<User> distinctUsers = distinctByProperty(users, User::getId);

2. 使用Collectors.toCollection

// 指定具体集合类型
public static <T> List<T> distinctByCollector(List<T> list) {
    return list.stream()
            .distinct()
            .collect(Collectors.toCollection(ArrayList::new));
}

四、第三方库解决方案

1. Guava库方法

// 使用Guava的ImmutableSet
public static <T> List<T> distinctByGuava(List<T> list) {
    return ImmutableSet.copyOf(list).asList();
}

特点：

返回不可变集合
保持首次出现顺序

2. Apache Commons Collections

// 使用CollectionUtils
public static <T> List<T> distinctByApache(List<T> list) {
    return new ArrayList<>(CollectionUtils.removeDuplicates(list));
}

五、对象属性去重方案

1. 基于Map的归并去重

// 根据属性合并重复项
public static List<User> distinctUsersById(List<User> users) {
    Map<Long, User> map = users.stream()
            .collect(Collectors.toMap(
                User::getId,
                Function.identity(),
                (existing, replacement) -> existing));
    return new ArrayList<>(map.values());
}

2. 使用TreeMap保持排序

// 按属性去重并排序
public static List<User> distinctAndSort(List<User> users) {
    Map<Long, User> map = new TreeMap<>();
    for (User user : users) {
        map.putIfAbsent(user.getId(), user);
    }
    return new ArrayList<>(map.values());
}

六、性能比较

1. 时间复杂度对比

方法	平均时间复杂度	最坏情况
HashSet	O(n)	O(n)
LinkedHashSet	O(n)	O(n)
遍历判断法	O(n²)	O(n²)
Stream.distinct()	O(n)	O(n)
TreeSet	O(n log n)	O(n log n)

2. 内存消耗对比

pie
    title 内存使用比较
    "HashSet" : 35
    "LinkedHashSet" : 40
    "TreeSet" : 45
    "Stream API" : 38
    "遍历判断" : 30

七、最佳实践建议

简单场景： // 无需保持顺序 return new ArrayList<>(new HashSet<>(list)); // 需要保持顺序 return new ArrayList<>(new LinkedHashSet<>(list));
复杂对象去重： // 根据ID去重 list.stream() .filter(distinctByKey(User::getId)) .collect(Collectors.toList());
大数据量场景： // 并行流处理 list.parallelStream() .distinct() .collect(Collectors.toList());
需要排序的去重： list.stream() .distinct() .sorted(Comparator.comparing(User::getName)) .collect(Collectors.toList());

八、特殊场景处理

1. 保留最后出现的元素

// 使用reverse + distinct + reverse
public static <T> List<T> distinctKeepLast(List<T> list) {
    List<T> reversed = new ArrayList<>(list);
    Collections.reverse(reversed);
    return reversed.stream()
            .distinct()
            .collect(Collectors.collectingAndThen(
                Collectors.toList(),
                lst -> {
                    Collections.reverse(lst);
                    return lst;
                }));
}

2. 根据多个属性去重

public static List<User> distinctByMultiFields(List<User> users) {
    return users.stream()
            .collect(Collectors.collectingAndThen(
                Collectors.toMap(
                    user -> Arrays.asList(user.getId(), user.getName()),
                    Function.identity(),
                    (a, b) -> a
                ),
                map -> new ArrayList<>(map.values()))
            );
}

3. 自定义相等逻辑

public static <T> List<T> distinctWithComparator(List<T> list, Comparator<? super T> comparator) {
    return list.stream()
            .collect(Collectors.collectingAndThen(
                Collectors.toMap(
                    Function.identity(),
                    Function.identity(),
                    (a, b) -> a,
                    () -> new TreeMap<>(comparator)
                ),
                map -> new ArrayList<>(map.values()))
            );
}

根据实际需求选择合适的去重方法，对于简单场景优先使用HashSet或LinkedHashSet，复杂场景考虑使用Stream API配合自定义逻辑。在性能敏感场景，建议进行基准测试选择最优方案。

文中内容均来源于公开资料，受限于信息的时效性和复杂性，可能存在误差或遗漏。我们已尽力确保内容的准确性，但对于因信息变更或错误导致的任何后果，本站不承担任何责任。如需引用本文内容，请注明出处并尊重原作者的版权。

THE END