I'm trying to access data from a textfile and apply things such as normal tests, confidence intervals, ANOVA tests and so forth.
Is there an easier way of creating conditional arrays using pandas from my data without manualing typing out 36 lines of code as I have done below?
Later I will need to access different flavours within these packs, so I will need to do the formulation about 7 times otherwise.
revels_data = pd.read_csv("revels2.txt")
rd = revels_data
# packet sums
total_1 = (rd.loc[rd["Packet number"] == 1, "Contents"].sum())
total_2 = (rd.loc[rd["Packet number"] == 2, "Contents"].sum())
total_3 = (rd.loc[rd["Packet number"] == 3, "Contents"].sum())
total_4 = (rd.loc[rd["Packet number"] == 4, "Contents"].sum())
total_5 = (rd.loc[rd["Packet number"] == 5, "Contents"].sum())
total_6 = (rd.loc[rd["Packet number"] == 6, "Contents"].sum())
total_7 = (rd.loc[rd["Packet number"] == 7, "Contents"].sum())
total_8 = (rd.loc[rd["Packet number"] == 8, "Contents"].sum())
total_9 = (rd.loc[rd["Packet number"] == 9, "Contents"].sum())
total_10 = (rd.loc[rd["Packet number"] == 10, "Contents"].sum())
total_11 = (rd.loc[rd["Packet number"] == 11, "Contents"].sum())
total_12 = (rd.loc[rd["Packet number"] == 12, "Contents"].sum())
total_13 = (rd.loc[rd["Packet number"] == 13, "Contents"].sum())
total_14 = (rd.loc[rd["Packet number"] == 14, "Contents"].sum())
total_15 = (rd.loc[rd["Packet number"] == 15, "Contents"].sum())
total_16 = (rd.loc[rd["Packet number"] == 16, "Contents"].sum())
total_17 = (rd.loc[rd["Packet number"] == 17, "Contents"].sum())
total_18 = (rd.loc[rd["Packet number"] == 18, "Contents"].sum())
total_19 = (rd.loc[rd["Packet number"] == 19, "Contents"].sum())
total_20 = (rd.loc[rd["Packet number"] == 20, "Contents"].sum())
total_21 = (rd.loc[rd["Packet number"] == 21, "Contents"].sum())
total_22 = (rd.loc[rd["Packet number"] == 22, "Contents"].sum())
total_23 = (rd.loc[rd["Packet number"] == 23, "Contents"].sum())
total_24 = (rd.loc[rd["Packet number"] == 24, "Contents"].sum())
total_25 = (rd.loc[rd["Packet number"] == 25, "Contents"].sum())
total_26 = (rd.loc[rd["Packet number"] == 26, "Contents"].sum())
total_27 = (rd.loc[rd["Packet number"] == 27, "Contents"].sum())
total_28 = (rd.loc[rd["Packet number"] == 28, "Contents"].sum())
total_29 = (rd.loc[rd["Packet number"] == 29, "Contents"].sum())
total_30 = (rd.loc[rd["Packet number"] == 30, "Contents"].sum())
total_31 = (rd.loc[rd["Packet number"] == 31, "Contents"].sum())
total_32 = (rd.loc[rd["Packet number"] == 32, "Contents"].sum())
total_33 = (rd.loc[rd["Packet number"] == 33, "Contents"].sum())
total_34 = (rd.loc[rd["Packet number"] == 34, "Contents"].sum())
total_35 = (rd.loc[rd["Packet number"] == 35, "Contents"].sum())
total_36 = (rd.loc[rd["Packet number"] == 36, "Contents"].sum())
# create total array
a = np.array([total_1, total_2, total_3, total_4, total_5, total_6, total_7,
total_8, total_9, total_10, total_11, total_12, total_13, total_14, total_15,
total_16, total_17, total_18, total_19, total_20, total_21, total_22, total_23,
total_24, total_25, total_26, total_27, total_28, total_29, total_30, total_31,
total_32, total_33, total_34, total_35, total_36])
# mean confidence interval
print(st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a)))
Thanks!
EDIT:
dataset looks like:
Packet number,Flavour,Contents
1,orange,4
2,orange,3
3,orange,2
4,orange,4
5,orange,3
...
36,orange,3
1,toffee,4
2,toffee,3
...
1,chocolate,5
...
etc.
desired data:
for each of the flavour types i want an array/list of the contents to analyse, i.e.
for orange:
4
3
2
4
...
so i can then apply various tests on these newly created arrays