-1

I got a series of data like this:

[1,3]
[1,3]
[2,4]
[3]
[4]

Every row contains 1 or 2 values, I need to extract them and calculate the average. The expected outputs are like this:

2
2
3
3
4

I have no idea how to remove the square bracket and comma to read the numerical value of the data properly and calculate the average.

Derek O
  • 16,770
  • 4
  • 24
  • 43
YURI
  • 11
  • 4

3 Answers3

0

If you have a series called s containing string representations of lists such as:

s = pd.Series(["[1,3]","[1,3]","[2,4]","[3]","[4]"])

0    [1,3]
1    [1,3]
2    [2,4]
3      [3]
4      [4]

You can apply a lambda function to first convert each row into a list of strings (credit goes to this answer), then convert each element of the list into an int, then calculate the mean:

s = s.apply(lambda row: [i.strip() for i in row[1:-1].replace('"',"").split(',')])
s = s.apply(lambda row: [int(n) for n in row])
s.apply(lambda x: sum(x)/len(x))

0    2.0
1    2.0
2    3.0
3    3.0
4    4.0
Derek O
  • 16,770
  • 4
  • 24
  • 43
  • Thank you. I have tried this method, but this does not work for me. The return error is “ TypeError unsupported operand type(s) for + 'int' and 'str'” – YURI Jan 19 '23 at 04:34
  • This means your lists are probably combinations of strings and ints. You can first convert your lists to all be of type `int`, let me know if this answer works – Derek O Jan 19 '23 at 04:40
  • thanks. How can I convert my lists to int? I have try astype(int) and int(df). both of them do not work for me – YURI Jan 19 '23 at 07:42
  • @YURI i updated my answer – did you try running `s = s.apply(lambda row: [int(n) for n in row])` ? – Derek O Jan 19 '23 at 07:59
  • I just tried. the return error is the same as using int() and astype(). which is "ValueError: invalid literal for int() with base 10: '['" – YURI Jan 19 '23 at 08:10
  • oh that means that your entire row is a string. you need to convert that to a list first – let me know if this latest answer works. also since you're new to stackoverflow, part of the reason your question was closed was because there wasn't enough information for people to know what was happening (as you can see from our back and forth, there was also quite a lot of guesswork i had to do) – if you want your question to be reopened, you should mention that this is a pandas series where each row is a string representation of a list – hope this answer helps! – Derek O Jan 19 '23 at 08:44
  • 1
    It’s work for me! Thank you so much. Sorry that I didn’t provide the necessary information. I am almost completely new to programming, so I don’t know what is essential for the question. Sorry about that. I will try to elaborate more information next time. Thanks again for your time. – YURI Jan 19 '23 at 08:58
0

You can try doing this by creating some variable called something like "total". You can loop through the array add each value to total. At the end, divide by len(inputArray) or just the size of array given. For example, if we got this array: a = [5,4,9] you can use this code:

a = [5,4,9]
total = 0
for num in a: #for every item in array
    total += a #total will equal 18 at end of loop
avg = total / len(a) # divide to get average. 
#print or use variable avg. expected output: 6

or if you are wanting a way to convert string like "[0,3]" to an array, that is using string functions:

a = "[3,5]" #or a = any string given
a = a[1:]#delete first "["
a = a[:len(a)-1]# delete last "]"
a = a.split(",")#get all numbers separated by ","
#finally, turn all strings into numbers
for i in range(0, len(a) - 1):
    a[i] = int(a[i])

And then use the average calculator in first code

Anders C
  • 3
  • 3
0

if you question is like this

   s=[[1,3],[1,3],[2,4],[3],[4]]

then you can write code as

   for i in s:
   print(int(sum(i)/len(i)))

the output will be

      2
      2
      3
      3
      4