2

I would like to know (in Python) how to count occurrences and compare values from different columns in different spreadsheets. After counting, I would need to know if those values fulfill a condition i.e. If Ana (user) from the first spreadsheet appears 1 time in the second spreadsheet and 5 times in the third one, I would like to sum 1 to a variable X.

I am new in Python, but I have tried getting the .values() after using the Counter from collections. However, I am not sure if the real value Ana is being considered when iterating in the results of the Counter. All in all, I need to iterate each element in spreadsheet one and see if each element of it appears one time in the second spreadsheet and five times in the third spreadsheet, if such thing happens, the variable X will be added by one.

def XInputOutputs():

list1 = []
with open(file1, 'r') as fr:
    r = csv.reader(fr)
    for row in r:
        list1.append(row[1])
    number_of_occurrences_in_list_1 = Counter(list1)
    list1_ocurrences = number_of_occurrences_in_list_1.values()

list2 = []
with open(file2, 'r') as fr:
    r = csv.reader(fr)
    for row in r:
        list2.append(row[1])
    number_of_occurrences_in_list_2 = Counter(list2)
    list2_ocurrences = number_of_occurrences_in_list_2.values()

X = 0

for x,y in zip(list1_ocurrences, list2_ocurrences):
    if x == 1 and y == 5:
        X += 1

return X

I tested with small spreadsheets, but this just works for pre-ordered values. If Ana appears after 100000 rows, everything is broken. I think it is needed to iterate each value (Ana) and check simultaneously in all the spreadsheets and sum the variable X.

Matt Ellen
  • 11,268
  • 4
  • 68
  • 90
Luis Puma
  • 21
  • 1

1 Answers1

0

I am at work, so I will be able to write a full answer only later. If you can import modules, I suggest you to try using pandas: a real super-useful tool to quickly and efficiently manage data frames. You can easily import a .csv spreadsheet with

import pandas as pd

df = pd.read_csv() 

method, then perform almost any kind of operation.

Check out this answer out: I got few time to read it, but I hope it helps

what is the most efficient way of counting occurrences in pandas?

UPDATE: then try with this

# not tested but should work

import os
import pandas as pd

# read all csv sheets from folder - I assume your folder is named "CSVs"
for files in os.walk("CSVs"):
    files = files[-1]
# here it's generated a list of dataframes
df_list = []
for file in files:
    df = pd.read_csv("CSVs/" + file)
    df_list.append(df)

name_i_wanna_count = "" # this will be your query
columun_name = "" # here insert the column you wanna analyze
count = 0

for df in df_list:
    # retrieve a series matching your query and then counts the elements inside
    matching_serie = df.loc[df[columun_name] == name_i_wanna_count]
    partial_count = len(matching_serie)
    count = count + partial_count

print(count)

I hope it helps

Michele Rava
  • 184
  • 4
  • 17